THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


PAGE 


On the Sampling Theory of the Roots of Determinantal Equations. 
M. A. GrIRsHICK 


An Optimum Property of Confidence Regions Associated with the 
Likelihood Function. 8.8. Wiixs anp J. F. Daty 


On Some Properties of Multidimensional Distributions. J. 
LUKOMSKI 


On a Class of Distributions that Approach the Normal Distribution 
Function. Grorce B. Danrzia 


The Length of the Cycles Which Result from the Graduation of 
Chance Elements. Epwarp L. Dopp 


On the Distribution of the “Student’’ Ratio for Small Samples from 
Certain Non-Normal Populations. H. L. Rrerz 


The Problem of m Rankings. M.G. KenpAuu anv B. BaBINGTON 


Notes: 


The Allocation of Samplings Among Several Strata. J. Stevens 
Stock anp Luster R. FRANKEL 


On the Coefficients of the Expansion of X™. 
On the Probability of Attaining a Given Standard Deviation Ratio in 


an Infinite Series of Trials. JoszrpH A. GREENWOOD AND T.N. E. 
GREVILLE 


Vol. X, No. 3 — September, 1939 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S. 8S. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FisHER R. pE MIiszs 

H. Cramér T. C. Fry E. 8. PEaRson 
W. E. DeminGa H. Hore.iine H. L. Rrerz 

G. Darmois W. A. SHEWHART 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS |@ 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts | 
should be typewritten double-spaced with wide margins, and the original copy” 
should be submitted. Footnotes should be reduced to a minimum and whenever ~ 
possible replaced by a bibliography at the end of the paper; formulae in foot- | 
notes should be avoided. Figures, charts, and diagrams should be drawn on © 
plain white paper or tracing cloth in black India ink twice the size they are to © 
be printed. Authors are requested to keep in mind typographical difficultieay 
of complicated mathematical formulae. . 


Authors will ordinarily receive only galley proofs. Fifty reprints ri 
covers will be furnished free. Additional reprints and covers furnished at cost, ~ 


The subscription price for the ANNALS is $4.00 per year. Single copies $1 25, | 
Back numbers are available at the following rates: 4 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to A. T. Craig, University of Iowa, Iowa City, Iowa. © F 


The AnNaLs or Maruematicat Sratistics is published quarterly by the 
Institute of Mathematical Statistics. a 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BautimorE, Mp., U.S. A. 





~ 





By 








ON THE SAMPLING THEORY OF ROOTS OF DETERMINANTAL 
EQUATIONS 


By M. A. Girsuicx' 


In a recent paper’ Hotelling has considered two functions of the covariances 
of two sets of variates (having a multivariate normal distribution with s variates 
in the first set, ¢ variates in the second, s < ¢) which he designates by Q and Z 
and which he defines as follows: 

8 

(1.1) Y = — and’ Z = rs 
where A is the determinant of the covariances among the variates of the first 
set, B the determinant of the covariances among the variates of the second set, 
D the determinant of covariances of the two sets taken together, and C a deter- 
minant obtained from D by replacing the covariances among the variates of the 
first set by zeros. Both Q°’ and Z are shown to be invariant under internal 
linear transformations of either set of variates. 

In solving the problem of determining linear functions of the two sets of 
variates for which the multiple correlation is a maximum, Hotelling arrives at a 
set of parameters p:, pe, --- , ps Which he names “canonical correlations” and 
which are the positive or zero roots of the determinantal polynomial 


—)on eects —)o1s O1,8+1 <2 O1,sit 


(1.2) D(X) = —)on ree —NOss Os,s+1 ae Os,e+t 


Os+i1,1 *** Osiiys —NOs41,841 es —NAOs41,84t 


Cotta *** Csite —NOe42,041 ae —NGsit,841 


The p’s are equal in number to the variates of the first set and bear the fol- 
lowing relations to Q and Z: 


(1.3) Q = pips --- ps 
(1.4) Z = (1 — pi)(1 — pz) --- (1 — pi). 


The corresponding functions for the sample covariances Hotelling designates 
by q and z, and the sample canonical correlations by ™, r2, ---,7.;. Under 
the assumption of complete independence between the two sets of variates and 


1 Most of this Research was accomplished at Columbia University under a Grant-in-Aid 
from the Carnegie Corporation of New York. 

? Harold Hotelling, ‘‘Relations Between Two Sets of Variates,’’ Biometrika, Vol. 
XXVIII, Dec. 1936. 

3 The function Z was first considered by 8. 8. Wilks in Biometrika, Vol. XXIV, Nov. 1932. 


203 





204 M. A. GIRSHICK 


in the case s = 2 and t = 2, he shows that the joint distribution of q and z is 
of the form 


(1.5) 1(n — 2)(n — 3)2*"” dq dz 
q and z satisfying the inequalities 
O<z<1, O<¢as1, «501-9 
and the joint distribution of the canonical correlations 7; and re is of the form 
(1.6) (n — 2)(n — 3)(ri — r2)(1 — 12) OL — 73) dri dro 


where 7 is one less than the number in the sample for each variate. 


I 


In Part I of this paper we shall, assuming independence between the two 
sets, find the joint moments of q and z for a general value of s and ¢ and extend 
the joint distribution of gq and z and hence of the canonical correlations to the 
case where there are two variates in the first set and any number of variates in 
the second, i.e. s = 2 andt > 2.° 


1. Joint Moments of g and z. Since we are assuming complete independence 
between the two sets of variates we may without any loss of generality represent 
the sample values of the second set as points on the first ¢ axes of unit distance 
from the origin in a space of n dimensions. The matrix of observations in the 
case of s variates in the first set and ¢ variates in the second set will take the form 


Yi LMy2 0 M3 > + * Lye+++ Lin 
U1 Lon 3 +--+ Lot +--+ Lan 


Xs Xs2 U3 °° Lst > * + Len 
. © @ens @ecce @ 
0 1 S wce @ ace @ 


. DB BO iscx Bice @ 


The polynomial D(A) of (1.2) in terms of sample variances and covariances 
calculated from (1.7) then becomes 


—)\Qy --- oes Ly 


(1.8) aap« | ™* 


Ti 


Lit 


n 
where a;; = Zz. LiX;. 
1 


4 This extension is a generalization of Hotelling’s method loc. cit. 








-—- ODO =e & 


eS 








ROOTS OF DETERMINANTAL EQUATIONS 205 








We multiply the first s rows of (1.8) by \ and factor out \ from the last t 
columns. This yields 


2 2 
—NQy-+: —NGQis Ay --+ Ly 
















2 ~ 
—N As a —AX Aes Lsi +++ Lest 


9 D(A) = °° 
(1 ) (A) Lu geal Tu 3 oa 





-=1 










Vit +++ Let 0 


As a further simplification, we multiply the (s + j)" column by 2;; for all j 
from 1 to ¢ and add the result to the 7“ column. When this is done for every 
value of 7 from 1 to s and the resulting determinant expanded by means of the 
last ¢ columns, the determinantal polynomial (1.9) becomes 


bu — Nan bi — Maye -++ bis — Nas 
D(A) = »** ; ; ae Soe - | 
ba pa, Nas bee =— Nase aa Des a Naas | 
or symbolically 
(1.10) Dd) = °* | big — Nai; | 

t 
where b;; = a Beli 

1 


Hence the s roots of D(A) which do not necessarily vanish may be obtained 
from the polynomial 


(1.11) Q(A) = | bs; — ai; |. 


The coefficient of the highest power of \ in Q(A) is given by | aj; |, the deter- 
minant of the elements a;;. Taking this in conjunction with (1.3) and (1.4) 
we see that 











iu Q(0) _ | bi 
(1.12) | as; | | a; | 
5 = OC) = Le 
lasj| | aay | 
where ¢;; = 7 Xi2;. 
t+1 
From the equations (1.12) we obtain 
(1.13) B {| as; Mtge?) = Et | dis || ccs} 


where E stands for the mathematical expectation of the expressions in the {  }. 

It is obvious from the definition of b;; and c;; that the two determinants | b;; | 
and | c;; | are independently distributed. Moreover, the joint distribution of q 
and z does not depend on the determinant | a;; |. The truth of the latter state- 
ment can be seen from the following geometrical considerations. If we con- 





206 M. A. GIRSHICK 


sider the sample values of each variate as a point in an n-dimensional space, 
then the two sets of variates determine two flat spaces, one of s dimensions and 
one of ¢ dimensions in that space. A sample canonical correlation can then be 
considered as the cosine of a certain minimum or stationary angle between two 
lines, one line lying in the flat s space and the other in the flat ¢ space. Since gq 
and z are functions of the canonical correlations, they therefore depend only on 
lines and angles between two planes. The quantities a;; on the other hand, 
depend on lines and angles lying entirely within one of these planes. 

From the above considerations we see that equation (1.13) can be written as 


Ej | ai; °°} E(q*2*) = E(| bi; |**)E(| e4; |?) 


or 


(8)  EIbis PECs!) 
(1.14) E(q*2') = E(| ay; |#e+2)) 


ml th ° | | , 
Che m** moment of a determinant | d;; | of sums of sample cross products of p 
variates is given by the formula’ 


| (2 + 2m +1 — ‘) 
Qe p I 2 
(1.15) E(\di;|") = I 


| Dis |" ay p(™ti-? 
2 


where D;; denotes the cofactor corresponding to o;; divided by the determinant 


|o:;|. Substituting (1.15) in (1.14) and simplifying, we get for the joint 
moments of g and z 


fttatl-i n—-t+26+1-7 n+1-—72 

a 9 r 9 r 9 
(1.16) E(q%2*) = I] , : oo 
i-1 | .ft+1—7\_f/n—t+1—-—7\,(/n+t+at+264+1-1 
se. 


2. Joint Distribution of g and z for s = 2,¢ > 2. In order to determine 
the joint distribution of g and z for s = 2 and t > 2, we shall first prove the 
following lemma. 

Lemma: Let q and z be defined as in (1.1) for two sets of variates having s variates 
in either set and let q’ and z’ be similarly defined with s < t where s is the number 
of variates in the first set and.t the number of variates in the second set, then for 
n = t+ 8, the joint distribution of q° and z is identical with that of 2’ and q”. 

Proor. If the number of variates in either set are the same and n = ¢ + 8, 


then by (1.12) 


2 _ |bi| _ lex| 
 fag|’ = [ay | 


6 Cf. S. S. Wilks, ‘Certain Generalizations in the Analysis of Variance,’’ Biometrika, 
Vol. XXIV, Nov. 1932. 
















ip 


ant 
int 


trika, 





ROOTS OF DETERMINANTAL EQUATIONS 






where 


t t+s t+s 
(1.17) bi; = X UiX; y ie » UjX;, aj = Le Uj Xj 
s+ 1 
and s = f. 
However, for s < t, and n = ¢t + s, we take for the second set of t variates 


points on the ¢ axes at unit distance from the origin in the (¢ + s)-dimensional 
space perpendicular to the first s axes. The matrix of observations in this case 
takes the form 













Vi 





T1202 -°* Lig Li,041 °° * L,e+t 





Yo, Lon > * > Los = X2,841 + °° Le,8+t 











(1.18) Xs U2 +++ Less Ls,s+1 oe Xs,s+t 
0 O 0 ] 0 





0 0 0 0 1 








Employing the same arguments as in equations (1.8) (1.9) and (1.10) we find 






that 
(1.19) Q(A) = | Cij — Na; |, q” a eis | 2! a | bis | 
| as; | | as; | 
where 
é t+s t+s 
bi; = X Xi2;, Cy = d Lit;, a; = » Xy2;. 


Comparing these equations with (1.17) we see that 


(1.20) z=q" Gq =2'. 


’ 

















This proves the lemma. 
Now let s = 2. Setting n = ¢ + 2 in equation (1.5) and using the trans- 
formation (1.20) we get for the joint distribution of g’ and 2’ 


(1.21) 14(t — 1)q!* 2’ dq’ dz’. 





Let r be the correlation between the two variates of the first set. The distri- 
bution of r in samples for which n = ¢ + 2 when the population correlation is 
zero is known to be 


( ) 1 — 
rT 
2 ( Pye 1) ] : 


; 4 
r(' + \ve 
2 
The distribution of r is independent of g and z. Hence, the joint distribution 
of q’, z’, and r is given by the product of (1.21) and (1.22). Dropping the 


(1.22) 


208 M. A. GIRSHICK 


primes from q’ and 2’ in (1.21), we get for the joint distribution of the three 
quantities in the case n = ¢ + 2, 


r(' ~ *) 
(1.23) 5M — 1) - : ” ie 
r( 2 )ve 


We shall now derive the joint distribution of g and z for a general value of n 
fors = 2,t > 2. Weset x = 2, x2 = y and take the ¢ sample variates of the 
second set to be points on the first ¢ axes at unit distance from the origin in a 
space of n dimensions. As in (1.12) calculate q and z. 


t t t 2 n n n 2 
Se Ly -(yx) ye hy - (Lx) 
(1.24) :. 1 1 on tt t+1 
l—r =< 1-—, ; 


2? — rye dq dz dr. 


We transform the points (a, ---,2%n) and (y1,--+,Yn) to hyperspherical 
coordinates, the transformation to be represented parametrically by the 
equations 


2; = sin 6, sin 6 --- sin @;-1 sin 0; 
L2 cos 6; sin 6 --- sin 6;;sin 6 


x3 Cos 62 --- Sin O41 sin 6, 


cas 64-4 sin 0: 
COS 6; COS O441 


COS 4; SIN 8441 COS A442 


Ln-1 COS 6; SIN O44: SIN Bese - ++ COS On_y 


Ln = COS 4; SIN O44) SIN O44 -+- SIN On_y 


with the same representation for the y’s in terms of parameters ¢; , ge, --- Gn. 
It is to be observed that in (1.24) and (1.25) =a” = 1, Sy’ = 1. This we may 
assume since q and z are invariant under such transformations. 

In this new coordinate system, our samples (a, --- ,2n) and (y1, +--+ , Yn) 
are taken as random points on a unit hypersphere about the origin in n dimen- 
sions. There is no loss of generality in this since x and y are assumed to be 
uncorrelated in the population and hence possess spherical symmetry of the 
density distribution in a space of n dimensions. 

The element of probability for the x points on this hypersphere is proportional 
to the (n — 1)-dimensional area on this sphere. Now the n — 1 dimensional 
area is given by 


V/g d6; dbz +--+ dOn1 















il 
al 


ROOTS OF DETERMINANTAL EQUATIONS 209 








where g is a determinant of order n — 1 in which the element in the zt" row 
and j** column is 





When 7 + J, all these quantities vanish as can be seen by inspection from 
(1.25). Whenz = 7, we have 


n ax 2 

oe «| = 
> ( “) = sin’ 62 sin’ 6; --+ sin” 6, 
1 


06; 
> (#) 
00 
£(%) 
00; 
OL a 2 
= cos’ 0 

ne ar 


- 2 : 
sin” 63 +--+ sin’ 0; 


| 


Ox 2 +2 
“) = cos & sin’ 0:41 
— 
Ox 2 . 9 _ 
“) = cos 6; sin’ 6j41 «++ sin” On_2. 
06 n-1 


Therefore 








° 2 - 4 ° 2(¢—1) 2(n—t—1) e 2 
g = sin” 6 sin’ 63 --- sin’ ’ 6, cos" (i -+-. di Qi 
and hence the element of generalized area is given by 


. 2 + t—1 —t—1 
sin 6) sin” 6;--- sin 6; cos” 6, 
(1.26) 


. —t—2 . 
sin” O141 +++ SIN Ono dO, dOe --- 





dOn-1 . 





Similarly we 












an show that the element of generalized area for the y point is 


sin ¢2 sin’ $3 --- sin’ ' d, cos” *" ge 
(1.27) ae | 
sin” ~ diya +++ SIN bn_e Ady dde --+- dni. 
The joint distribution of 6;, 02, --- , On. and ¢1, ¢2, --- ,@n-1 (since the 


6’s are independent of the ¢’s) is proportional to the product of (1.26) and (1.27). 
We now introduce four new sets of variables, u, v, wu’ 
following equations 


, v', defined by the 








(1.28) Xi = u;sin 4, Yi = V; sin ¢ (¢ = I, 2, a , t) 
(1.29) x; = Uj cos Yi = 0; COS de (jg =t+1,---,n). 


The u; and v; can be regarded as two points on a sphere in a space of ¢ dimen- 
: / / . * . . 
sions and u; and v; as two points on a sphere in a space of n — ¢ dimensions. 





210 M. A. GIRSHICK 


Let A be the angle between the two points uw and v and yu the angle between 
the two points w’ and v’. Then 


n 


t 
a 
cos A = =. Us 5 cos p = 2. U;V;. 


os j=t+1 
The probability element for \ is proportional to sin” \ dA, and that for y is 
proportional to sin” “~ uv du. 

From the definition of u; and v; , we see that they depend only on 6; , 42, --- , 
6113; o1, $2, +--+, $1 respectively, and wu; and v; depend only on 6441, 
O119,-++ 5 Ont 3 ber, Gee, ++ ,n1 Tespectively. It follows that the quanti- 
ties A, wu, 0: , and ¢; are independently distributed. 

The joint distribution of the 6’s and @’s we integrate between constant limits 
with respect to all the variates except @; and ¢;. This gives for the joint 
distribution of 0, and ¢, 


e t—] e t—] —t—] n—t—] 
A,sin 6;sin’ ~~ ¢; cos” 6, cos , dd, dd, 


where A, is a constant depending only on n. 
Multiplying this by the distributions of \ and uw and dropping the subscript ¢ 
from @ and ¢ we get for the joint distribution of A yu, 6, and ¢ 


(1.30) k, sin’ " 6 sin’ ¢ cos” ** 6 cos” *' @ sin*” \ cos” ~~ wp dé do dd du 


where k, is a constant depending on n. The limits of integration for @ and @ 
are 0 and 7/2; for \ and yu they are 0 and zr. 

Expressing q and z in terms of the new quantities as defined in (1.25), (1.28) 
and (1.29) we get 


t t t 2 
(1.31) 2 (= " \( i’) - (> ) ~ sin’ @ sin’ ¢ sin’ \ 


l1-—r 1-—?r 


= a - cos’ 6 cos’ ¢ sin’ 
(1.32) “ rT 1 —-* 
; = 


where 
(1.33) r = Lary = sih 6 sin gd cos A + Cos 4 cos ¢ COS 


is the sample correlation between z and y. 

We now consider a transformation of the variables @, ¢, and uw in (1.30) to 
the new variables q, z, and r. Without troubling to compute the Jacobian J 
of the transformation, we know that it is independent of n since the relations 
(1.31), (1.32) and (1.33) do not involve n. Substituting from (1.31) and (1.32) 
into (1.30) we get for the joint distribution of g, z, 7, and 


kw’ teh t Yd ia phyite—®) dq dz dr dd 


















is 


t t 


du 
lo 


28) 


) to 
n J 
ions 


.32) 


ROOTS OF DETERMINANTAL EQUATIONS 211 








where y is independent of n. Integrating with respect to \ between limits which 
are independent of n, we get for the joint distribution of q, z, and r 


(1.34) kw’ Lin t- Diy 


But, for n = 


— ry" dq dz dr. 


t + 2, this joint distribution reduces to (1.23). 


7 (' + *) 
2 7 lg 1 = °)- 


r (' _ ') Va 


, -2_ 4(n-t-% 
king’ gi” t 0] 


Therefore 
l 

hie = 5 t(t — 1) 

so that (1.34) can be written as 

— rr)" dq dz dr. 

However, since the distribution of r is known to be 


(i = pry itr 3) dr 


we finally get for the joint distribution of g and z 





t—2_3(n—t—3) 
i dq dz 










where h, depends on n. 
inequalities 





The integral over the entire region defined by the 


O<s¢6 ! 


a ? 


46s 8, 


2< (1 — 4) 











must equal unity; the 
(n — 2)! 


Hite ~1— OY 


constant h, is therefore readily found to be 






Thus the joint distribution in the final form is 


(1.35) 







(n ne 2)! t—2_4(n—t—3) 
" dq dz. 
2(t — 2)!(n —t — 2)!4 d 


Now by (1.3) and (1.4), ¢ = rv2,2 = (1 — r})(1 — r3), and hence the Jacobian 








ini a(q, 2) 
i ( — 
( , ) a(n, re) 


Making the transformation in (1.35) we get the joint distribution of the 
canonical correlations 7; and re (for the case s = 2 and a general value of ¢) in 
the form 


2(ri — rs). 


















(n — 2)! 


37) @ =a) _—t— 2)! 


(rt — 73) (rire)? [. — rD — rR dridre. 


M. A. GIRSHICK 


II. JOINT LIMITING DISTRIBUTIONS OF CANONICAL 
CORRELATIONS AND LATENT ROOTS 


In formula (1.37) we set 
ky = mri , ke = nrs 
and get for the joint distribution of k; and ke 


(n — 2)! 


4(t—3) 
A(t — 2)! (mn —t — 2)! nt (hy — Kea) (her ea)" 


- k 7 4(n—t-—3) 
(1 — ANC — i) | dk, dk. 


WI eae iy 
‘Then n — «, the quantity ~__. approaches 1 anc —- 
‘ “n(n —t — 2)! PI : n 


Hence the limiting distribution of the two canonical correla- 


(2.1) 


-3k2 


approaches e 
tions is given by 

1 
4(t — 2)! 
We shall call (2.2) the ‘“‘generalized chi-square” distribution and show that the 
100ts of the characteristic polynomial 


(2.2) (ky — ke) (kike)*®? ¢ 2% **® dk, dhe. 


au —k 
(2.3) g(k) = 
a2} doe — k 
are distributed in precisely this form. Here a;; = Ya2;x; where x; and 22 are 
. 6 . . . . . . . 
normally and independently’ distributed with unit variance in the population 
and zero mean in the sample. 


Let k, and ke be the roots of (2.3). That is, k; and ke are the two roots of 
the quadratic equation 
(2.4)  — ~rik + po = 0 
where 
(2.5) Pi = hi + ke = au + ax 
(2.6) po = kike = ayade — a 


In the absence of correlation in the population, the joint distribution of ay, 
Aa and dy is known to be 


4(n—3) 


~ au Arp —}(a},+a99 
(2.7) hn ora day; daz day 
aa a2 


where h, is a constant depending only on n. 


6 The part of the assumption relating to independence may be removed without loss of 
generality. See last paragraph below. 













ROOTS OF DETERMINANTAL EQUATIONS 





213 







We consider a transformation to the variables p,, po and ay. From (2.5) 
and (2.6) we calculate the Jacobian J of the transformation, 


(2.8) 


and since 














(2.9) 2ais = pi + (pi — 4p2 — 4aip)! 
l 
(pi — 4p2 — 4aje)*° 


Substituting from (2.5) and (2.6) into (2.7) and multiplying by J, we get for 
the joint distribution of k; , ke and ay 


(2.10) J = 






(2.1 1) haps” 3) er : dp. dp2 day2 ae 
(pi — 4p2 — 4aie) 


We make the transformation wu = aj: and get for the joint distribution of ky, 
ke and u 


(2.12) he gio grins den don ds 
¢ (bu — 41’) 
where b = pi — 4pe. 
Since both ay and az: are real, equation (2.9) shows that b — 4u > 0. Hence 


the limits of integration for u are 0 and ; . Integrating out wu in (2.12) between 


the above limits we obtain the joint distribution of p; and pe. 
Now the integral 


b/4 b/s 
du . _1 gf — Oa +6 a 
(2.13) [ ca" 4 sin ( 5 )| = ¢ 


0 









where c is some constant. Hence the joint distribution of p: and pe is given by 


(2.14) Haps” > &*”' dp; dpe. 


By integrating (2.14) over the region 0 < po < 4) and 0 < pi < & we get 


H, = 3<(n — 2)!. 
We next transform p; and p, in terms of k; and ke from (2.5) and get for the 
joint distribution of k; and k, 










1 


(2.15) “_ 


gy; (1 — be) (kerk) 8? CA diy dhe. 


This distribution is identical with that of (2.2) with n = t. 
The above is an example of a more general 


214 M. A. GIRSHICK 


THEOREM: Let ri, r2, ---,7s be a set of simple finite canonical roots of the 
two independent sets of variates 4%, ---,2%s, ANd Woy1,°-+,Xsyt. Let ky = 
nr; (¢ = 1, 2,---,8). Then the joint limiting distribution of the k’s approaches 
the exact joint sampling distribution of the latent roots of a matrix of sample product 
sums with t degrees of freedom of s normally distributed variates having unit variance 
in the population. 

Proor: The proof follows from equation (1.11). For let us multiply and 
divide a;; in (1.11) by n and set nd” = k. The determinantal polynomial 
becomes 
(2.16) g(k) = | bi; — ks;; |. 


Without any loss of generality, we so transform the first set of variates that 
they become of zero correlation and unit variance in the population. Then it 
follows that 


E(si;) = B(x a bi; 
where 6;; equals zero for 7 # j and 1 for? = 7. 

Now let P(a > a) stand for the probability that the variate x be greater 
than or equal to some constant a. Then, by the Strong Law of Large Numbers 
we can state that, given an e > 0 andaé > O there exists a positive integer 
such that for n > no 


P{| siz; — 6:j] > 6} Se. 


t 
If then we let n increase indefinitely, the quantity b;; = >. xa; remains fixed 
1 


while s;; approaches, in the probability sense, 6;;. Since the roots of a poly- 
nomial are continuous functions of the coefficients, we can, by an extension of 
the Law of Large Numbers, show that in the limit the roots of (2.16) will be 
distributed like the roots of the polynomial 

g(k) = | bi; — ké;; |. 
This proves the theorem. 

Coro.uary 1. The limiting distribution of q° in case of complete independence 
between the two sets of variates approaches the exact distribution of a generalized 
sample variance (i.e. a determinant of sample variances and covariances) with t 
degrees of freedom. The proof follows from the fact that q° is a product of the 
roots of (1.11) and therefore by the above theorem, is distributed in the limit like | bj; |. 

Corouuary 2. The distribution of the sum of the squares of the canonical 
correlations approach in the limit a x° distribution with st degrees of freedom. 
This ts obvious since in the limit the sum of the squares of the roots, by the above 
theorem, has the distribution of bi, + be + --+ + bs, and each b;; is distributed 
like x” with t degrees of freedom. 

While the canonical roots of (1.2) are invariant under any non-singular linear 
transformations, the latent roots of a determinant of sample covariances are 












ROOTS OF DETERMINANTAL EQUATIONS 215 
invariant only under an orthogonal transformation. But there exists an or- 
thogonal transformation which reduces a set of variates having a multivariate 
normal distribution to a set which are normally and independently distributed 
with variances equal to the latent roots of the population generalized variances 
of the original variates. Hence, in dealing with the distribution of latent roots, 
we may assume independence in the population without any loss of generality 
but the assumption of equal variance leads only to a special case. Moreover, 
the above consideration also explains the form of the asymptotic error of the 
sample latent root given in Part III of this paper. 


Ill. ASYMPTOTIC STANDARD ERRORS OF LATENT ROOTS AND 
COEFFICIENTS OF PRINCIPAL COMPONENTS 


1. Many statisticians have had occasion to use in their statistical analyses 
characteristic roots (or as they are sometimes called “latent” roots) of deter- 
minants of correlations or covariances. Especially has this become true since 
the publication of Hotelling’s paper on principal components.’ It is therefore of 
great importance to find, if not their sampling distributions, at least their 
limiting distributions and their asymptotic standard errors. This we shall do in 


































' this paper for the case of non-vanishing simple roots and by the same method® 
' get the asymptotic variances and covariances of the coefficients of principal 
"0 components. We have already derived in Part II the sampling distribution of 
the two latent roots of a determinant of covariances obtained from two nor- 
mally distributed variates having equal variance in the population. This 
distribution is of no great importance in itself except that it gives us some idea 
d as to the form of the distribution in the general case. 
In what follows, we shall use the convention that a repeated subscript in the 
* same term stands for summation. If repeated subscripts appearing in a term 
“ are not to be summed, we shall place them in brackets following the expression 
in which they appear. Thus in the equation (3.1) below, we sum with respect 
to j but not with respect to q even though on the right hand side q appears twice. 
Let 71, %,---,2. be a set of variates which have a multi-variate normal 
distribution. We assume that these variates have been resolved into com- 
- ponents by Hotelling’s method.’ Let y1, y2, --- , Ys be the principal compo- 
- nents. Then 2; = a;;y;. The a;,’s satisfy the following equations: 
i. (3.1) QjgFi; = AgQig, [q] 
i |- (3.2) AipMig = Agdpq 
cal ee 
vi 7 “Analysis of a Complex of Statistical Variables into Principal Components,”’ The 
ove Journal of Educational Psychology, Sept. and Oct. 1933. See also M. A. Girshick, ‘‘Prin- 
ted cipal Components,”’ Journal of the American Statistical Association, Vol. 31, Sept. 1936. 
8 The method here employed is parallel to the one used by Hotelling in his paper of 1936 
par in deriving asymptotic standard errors for canonical correlations. 


9 Loe. cit. 

















wa Re 


Pa? ae to 2. oe oe 2 


216 M. A. GIRSHICK 


where the symbol 6,, has the value zero for p # q and 1 for p = q, A, isaroot 
of the characteristic equation 


(3.3) | oi; — bi; | = 0 
and o;; is the population covariance of x; and 2; . 

If we multiply (3.1) by a,, , sum with respect to 7 and use (3.2), we get 
(3.4) AipAjgQoij = N728pq . 


When a root of (3.3) is simple and not equal to zero, the corresponding a;;’s 
and the root itself are definite analytic functions of the o;;’s over a region without 
singularities. A set of sampling errors do;; in the covariances will then deter- 
mine a corresponding set of sampling errors in the a;;’s and in the root. 

We assume then, that the roots \1, Ax, --- , A, of (3.3) we are considering are 
simple and non-vanishing. In terms of the derivatives of the analytic functions 
we define 


(3.5) die = — i 


ong OO nq 


where dong = Spq — Opq 4 Spq being the corresponding sample covariance. 
Differentiating equation (3.1) and employing the above formulae we get 


(3.6) 05; jg + Q ;qdo:; _ A_dd ig + AigdXg. [q] 


We now multiply this equation by a;, , sum with respect to 7, and use equations 
(3.1) and (3.2). This yields: 


(3.7) Ap jp jg + DipA jqQdoij = AqAipdAig + AgbpqAA«. lp, @] 
When p = q, the term \,a;,da;, cancels out and equation (3.7) reduces to 
(3.8) ApAX, = ip A jpdo;;. [p] 


We change the subscripts p, 2, 7, to g, k, m, in (3.8) and multiply together the 
two equations thus obtained. This gives: 


(3.9) ApAqdApAA, = Qi pA jpAkgAhmglo i ;donm . [p, q\ 
Hence 
(3.10) ApAgH (drApdrg) = AipA jpAkgAmgH (doi dom) lp, @] 


where the symbol # denotes the mathematical expectation or mean value of the 
expression following. 

Now it can be easily shown by means of the characteristic function of a 
multivariate normal distribution that 


(3.11) E(doj;doxm) = ~ (6:40 in a Timo jk) 








ns 


che 





ROOTS OF DETERMINANTAL EQUATIONS 217 


where 7 is one less than the number in the sample. Substituting this expression 
in equation (3.10) and using (3.4) we get the following rather simple result 


2n7,6 

(3.12) ApAgH(dvpddry) = es [p, a] 
Setting p = q in this formula we get 

«. 
(3.13) E{(dd,)"] = a 
But when p ¥ q 
(3.14) E|dX,dr,| = 0. 

Let h, bh, ---,l:, be the corresponding latent roots of a determinant of 


sample covariances. The sample latent root l, may be expanded about A, in a 
Taylor series of the form 


7 dr» 1 @nX, 
(3.15) he _ Ap + aon dor + 2 0,1 00 us dondouy + el 
or, by (3.5) 
(3.16) lL—-rA =a,+--:. 


Squaring both sides of (3.16), taking the expected value, and using (3.13) we 


find that the sample variance of a latent root l, , apart from terms of higher order in 
ni, 


n', is given by 
If in (3.11) we set 2 = 7 = k = m, we get the variance of a sample variance, 
and it is interesting to note that its form is identical with the first term of the 
asymptotic expansion of the variance of a sample latent root. 
The sample covariance of any two distinct roots is by (3.14) zero for the first 
term of the asymptotic expansion. That is, the covariance is at least of order 


—9 


n. All the above results also follow from the fact, shown by the author in a 
previous paper,” that the coefficients of the principal components and hence the 
latent roots are maximum likelihood statistics. This property of the latent 
roots permits us also to state the following 

THEOREM: Let \,, \2, ---+ , Az be any set of simple non-vanishing roots of (3.3). 
For sufficiently large samples these will be approximated by certain of the latent roots 
L,le, +++ ,l,of the samples. If 1; — d; is divided by the standard error 


a1, = Xi /? 


the resulting variates have a distribution which, as n increases, approaches the 
normal distribution of t independent variates of zero mean and unit standard 
deviation. 


10 Loc. cit. 
































































































































218 M. A. GIRSHICK 


Coro.uary: Let \; be a maximum simple, non-vanishing root of (3.3) and let 
l, be the corresponding maximum sample root. Then, 1; — i divided by its 
standard error has a distribution approaching normality in the limit. 


2. The Variance of Log /. The formula for the standard error of the latent 
root given above contains a population parameter \ the numerical value of 
which we usually do not know. It is therefore important to find atransforma- 
tion of the latent root to a new variate which will have or its leading term of the 
asymptotic standard error a quantity independent of the population parameter. 

Let k = f(l) be such a transformation. Then K = f(A) is the corresponding 
transformation for the population root. 

We now expand & in a Taylor series about 1 = A 


(3.17) dk = f'(r)dl + 3f’(A)(dl)° + --- 
and get an approximation 
(3.18) dk = f'(A)dl. 

Squaring both sides and taking the expectation, we get 


= 


nv 


(3.19) E(dk)” = [f'Q)F EL(@)"] = [f’Q)] 


Now set E(dk)” = 2/n. Then, from (3.19) 


f(A) = 1/r 


or 

(3.20) f(r) log 
Hence, if we set k = log 1, then 

(3.21) a, = 2/n 


is an approximation to the variance of k and is independent of any population 
parameter. 


3. The Asymptotic Variances and Covariances of Roots of Determinants of 
Correlations. While the formulas for the asymptotic standard errors of the 
latent roots of a determinant of covariances are rather simple, this is not the 
case with the roots of a determinant of correlations. In deriving the asymptotic 
standard errors of simple non-vanishing roots of a determinant of correlations, 
we again assume that the variates 2, 22, --- , Z», Which in this case are of unit 
variance in the population, have been resolved into principal components. The 
equations of the previous section, up to and including (3.10), remain the same 
except that we substitute p;; for every o;;, where p;; is the population correlation 
of x; with x;. Thus equation (3.10) becomes 


(3.22) ApAgH (ddpdXq) = Dip jpAkgAmglt (dpi jdpim), [p, q] 









was @ 


On 


‘he 


ion 












ROOTS OF DETERMINANTAL EQUATIONS 219 





where dpi; = Yi; —pi;, Ti; being the sample correlation between z; and z;. 
The expected value of dp; ;dpm is not, as in the case of the o’s given in the simple 
form of (3.11) but rather it is given asymptotically, the leading term in n™~ 
being the following lengthy expression: 


NE (dpi jdpim) = PikPmj + PkiPmi — PigPkiPmi — PijPkiPmi 
2 2 
(3.23) — PkmPKiPkj + 3PijiPkmPKi + 3PijPEmPK: 
3 2 . . 
— PkmPmiPmj + FPijPkmPmi + 3PiiPemPmj- _—([t, j, k, m] 


Substituting this in (3.22) and simplifying by means of equations (3.1) and 
(3.4) we finally get 


MphgE (drpddq) = ANG8pq + ApAgAi pA jqhi i) 
sl 2(ApAGAipAig + Np AGA; pAjq)- [p, @] 
When p = q, (3.24) becomes 


(3.24) 


7! 2 - 
625) Ba)" =? ls + alalpot — 20, > a, |. (pl 
When p ¥ @, 
2 
(3.26) E(ddyddq) = = [aindigpis — (Ay + Aa)ainaigl. —[P, I 


Hence (3.25) is the leading term of the asymptotic expansion of the variance 
of A,, and (3.26) is the leading term of the asymptotic expansion of the co- 
variance of A, and A,, where A, and A, are simple, non-vanishing roots of a 
determinant of correlations. 


4. Asymptotic Variances and Covariances of Coefficients of Principal Com- 
ponents Derived from a Determinant of Covariances. Let 2; = a;;y; be the 
equation of transformation of the variates 2, 22, ---, 2, into principal com- 
ponents. In what follows we assume that the latent roots of the determinantal 
equation (3.3) are simple and none equal to zero. The last restriction makes the 
determinant of covariances non-vanishing. The determinant of the a;;’s will 
therefore be also different from zero. With these assumptions in mind, we now 
proceed to derive the asymptotic variances and covariances of the a;;’s. 

We set p = qin (8.2) and differentiate the result. This yields: 


(3.27) dX» = 2a1,daip ; [p] 


where the summation index 7 was replaced by J. Substituting for dd, from (3.8) 
we get: 


(3.28) QipA ;pdoi; = 2d,41,daip . [p] 
Now, when p = q, equation (3.7) reduces to 


Ap jpdAjg + AipA; doi; = AgQipdaig , [p, 9] 








M. A. GIRSHICK 


or 

(3.29) Dip jgdoi; = (Ag — Ap) Aipdaig [p, q] 
We combine equations (3.28) and (3.29) into one equation 

(3.30) DipA joi; = (Ag + EpgrAp)Arpdar, , lp, 4] 


where e€,, has the value 1 when p = gq and —1 when p # g. The reciprocal of 
Ag + €peAp, (Which is different from zero), we denote by b,,. Then equation 
(3.30) can be written as 


(3.31) QipbgpA jgdoi; = Aipddi, . [p, q] 


Since the determinant | a;; | of the a;,’s is different from zero, we can solve this 
set of homogeneous linear equations for da;,’s, (J = 1, 2,---,s). To do this 
we multiply equation (3.31) by A’, where A” is the element of the ¢‘" row and 
p’” column of the inverse of the determinant | a;; | , and sum with respect to p. 
Since Aa, = 6, we get, 


(3.32) A? Ainbg pA jqgdoi; = budary = dar. (q] 
We now change the subscripts 2, j, t, p, g, in (3.32) to k, m, r, u, v, respectively, 
multiply the two equations thus obtained, and take the expected value: 


(3.33) E(da tq Q;) => A af : “al ipQk Mole ul jgAm yi (do i jdox, oat . lq, v] 


Substituting for Edo; ;dox» its values from (3.11) and simplifying by means of 
(3.4) we get: 


2 2 8 
ro da A - A "is bav + “ an a A Pies ies - 


n n u=1 


(3.34)  E(dargda,) = 


where we sum only with respect to u. We may simplify this formula to some 
extent by employing the relation: A“ = ay/A,. (This relation is obtained 
from (3.2) by multiplying each side of that equation by A” and summing with 
respect to p). When this is done and the values for the b’s are substituted, the 
final result becomes: 


' Neo Aty Arc 
E(d i 1 ry) = ( 
(¢ Aig Aa ) n (Aq + Eqv Nv) (Av +. Egv Ag) 


NS av : Qtu Aru 
+ n » (Ag + E€qu Au) (Av + Cvu Au) : 


From this we derive the following specific formulas: 


(3.35) 


Atq Arg 


4n 


: | Qn Gri Qtq Arg Qts Ors | 
=o} Ot on ee 
7 m L(Ag — Au)? 4? (Ag — As)? 


E(daigdarq) = 
(3.36) 








of 
yn 


ly, 


of 


me 
ned 
ith 
the 












ROOTS OF DETERMINANTAL EQUATIONS 221 


7 2 Aig rn; ai ie ais 
(3.37) E\(da.)'] = 7+ = +---+—t+---+ 


nm L(X_ — 1)? 4n? (Aq — 2,)? 
Aga Atya 
ot E eo = a ‘ v 
(3.38) (daig dary) = ae (q ¥ v) 


Formulas (3.36), (3.37) and (3.38) give us the leading terms of the asymptotic 
expansions of the variances and covariances for the principal components. It 
should be remarked that the coefficients of ‘“‘mutual regression’’ equations can 
be easily shown to be proportional to those of the principal components. Hence 
their asymptotic standard errors and covariances may be derived in a similar 
manner and will be of the same form. 


5. Variances and Covariances of Latent Roots when the Population Roots are 
Equal. Let k,,ke,--- ,k, be the latent roots of a generalized sample variance 
of p normally distributed variates. 

Ordinarily the subscripts of the roots designate their ranks, so that ki > ke, > 

- >k,. Wemay, however, assign to a root a subscript from 1 to p without 
any regard to its size.” If this is done randomly for every sample of n observa- 
tions the mathematical expectation of kjk5k;, --- will be the same for every 
permutation of the subscripts 7,7, k, --- . This fact permits us to calculate the 
variances and covariances of the above roots. 

We may assume, without any loss of generality, that the p variates are 
independently distributed,” and furthermore we assume the population roots to 
be all equal to unity. Then equation (3.11) becomes 


, 1 
(3.39) E(8i;Skm) = 6:;6km + ; (6:%5jm + 5im4 jx). 


Where s,, is the sample variance of x, and x, and 6,, is the Kronecker delta. 
Now it can be easily shown that 


Pp p Pp Pp 
(3.40) Zz Six = 7 k, z=: (si:8;; -- Si) = > kk;, i sii + 2 ra ij = p k’. 
1 I I I 


t<j t<j t<7 


Hence E(k) = 1, and 


E(k) = E(D sis + 2D sii) 
<7) 
or 


(3.41) pEk’ = pEsi; + p(p —1)Esi;. (i ¥ j) 
Substituting from (3.39) in (3.41) we get 


+5 
n —- 


E(e) =1+ 2 





1t This approach was suggested to the author by Professor Hotelling. 
12 See Part II, last Paragraph. 





222 M. A. GIRSHICK 

The variance of k is therefore given exactly by 

(3.42) = EW) — 1-27". 

In a similar manner we find the covariances of k; and k; to be 


(3.43) lies se 
n 

IV. DISTRIBUTION AND MOMENTS OF QUANTITIES RELATED TO 
g AND z 


From the known distribution of qg and z and their expressions in terms of the 
ratio of determinants given by (1.1) and (1.12), we can derive moments and 
distributions of several related functions of sample variances and correlations 
of two independent sets of variates. 


| bis | 
| cx; | 
Since the two determinants in (4.1) are independently distributed, the sampling 
distribution of p, given in the above form, can be obtained for a general value of 


sand t from Wilks’” distribution of the ratio of independent generalized variances. 
Thus, for s = 2 and ¢ > 2, the distribution of p is given by 


2 
(4.1) Let p = = by (1.12). 


I'(n — 2) $(t—3) dp 
as Pe — Ifa — ¢ ~ 1) P (1 + Vp)" 


y . ° ° 2. 
When the number of variates in each set is the same, the numerator of q¢° in 


(1.1) becomes the square of the determinant of covariances between the two sets of 
variates. Thus 


(4.3) a= del 
| as; | | das | 


where 7, j, take on values from 1 to s, a, 8 take on values from s + 1 to 2s, and 
n 
= >) tut. 
1 


If the two sets are independent, the quantities q’, | a;;|, | @as|, are inde- 
pendently distributed. Hence 


(4.4) E(| dia |") = Eq”(| ag; |") E(| aes |*"). 


Setting 6 = 0 in (1.16) and employing formula (1.15) we get for the moment of 
| Gia | 


s+m—tt \n(” +a~ its) 


(4.5) | aia") = aim a LI- s—t+1\,.fe-1 +1 
EC) 


13 Loc. cit., pp. 478-479. 























ROOTS OF DETERMINANTAL EQUATIONS 223 





where A uw, denotes the cofactor corresponding to ¢,, divided by the determinant 
| guv |, Tu» being the population covariance of x, and z, . 

We may replace the product sums in (4.3) by sample correlations and, with the 
assumption that all the variates come from independent populations, obtain the 
m™ moment of the determinant of correlations between the two sets as 


(2) r (* +m—it+ ' r ( +m—it+ ') 
2 T 2 2 





(4.6) E(\ rie") = —7 I] eas a 
O rs I r(” 
2 2 2 
1e This follows from the expression for the m‘" moment of q and the formula 
id re) . (; +2k—it+ ‘) 
ns a 8 
: 2 2 
7 E uv F = ‘ . - 
(4.7) (\ ru {*) -, (wt Bk I (aati 
2 2 
derived by Wilks.” 

If we set s = ¢ = 2, the numerator of q’ in (4.3) becomes the square of a 
= determinant of sample covariances (or correlations) known to psychologists as 
° the tetrad. We shall here derive its distribution under the assumptions that the 
” four variates are independently distributed. 

We write 

(4.8) 

in where 

of (4.9) T = 133724 — 114793, uy = (1 = rie), Uu = (1 = 7?) 
and q is taken as positive. 

Now the distribution of q for s = t = 2 is given by 
nd (4.10) (n — 2)(1 — q)” “dq 

and the distribution of u is known to be 
de- or (*) 
(4.11) < i u”?1 — uw?) ? du. 
ver("5") 
t of 


Hence the distribution of u,wv2 and q is given by 
“() 
(4.12) = 2) . 2 i (1 — q)” "(uu)" “(C1 — u?)(1 — ud)? dur duedg. 
r(S) 
2 


14 Loc. cit., p. 492. 


224 M. A. GIRSHICK 


Performing the transformation (4.8) and integrating out uw and ue we get for 


the distribution of the tetrad 


ofn 

4(n — 2)P (*) ' (uju. — T)" 

(4.13) — du, dus. 
rT? (" - ') rte V(1 — ui) (1 — 8) 

2 uy 


All the moments of 7’ can of course be obtained by setting s = 2 in (4.6).”° 


U.S. DEPARTMENT OF AGRICULTURE, 
WasHINGTON, D. C. 


145 The limiting distribution of the tetrad was given by J. L. Doob in an article entitled 
“The Limiting Distributions of Certain Statistics,’ Annals of Mathematical Statistics, 
Vol. 6, (1935). For amore general distribution of the tetrad and other statistics considered 
in this paper see W. G. Madow, ‘‘Contributions to the Theory of Multivariate Statistical 
Analysis,’’ Transactions of the American Mathematical Society, Nov. 1938. 





AN OPTIMUM PROPERTY OF CONFIDENCE REGIONS ASSOCIATED 
WITH THE LIKELIHOOD FUNCTION’ 


By 8S. S. Wiiks ANb J. F. DaALy 


One of the authors [1] has recently established a connection between the 
method of maximum likelihood and shortest average confidence intervals for the 
ease of one unknown parameter, and has reported a generalization [2] of this 
result for the case of several parameters. It is the object of this paper to consider 
the several-parameter problem in greater detail and at the same time to make the 
previously obtained result slightly stronger, particularly in the one-parameter 
case. 

Let 2 denote a set of random variables, and @ a set of parameters 6,, --- , . 
Suppose IIo is a population with the cumulative distribution function F(z, 0) = 
Fy say. Then the logarithm of the likelihood associated with the population 
IIo of random samples 0,:%) , 2, +--+ , 2%, drawn from Ilo is 


L"(a, 0) = >. log dF (za, 6). 
a=l1 


For a given sample 0, we shall say that a set of functions H?(z, 6) is of class K 
if there exists a domain FR of parameter points 6: (6,, --- , 6.) in a 6-space such 
that for each 6)in R: 


(i) H? (x, 0) = Hiyis of the form >. hi(xa , 4%); 
a=1 


(ii) A;(x, 0) = hAio exists for all x except possibly for a set of zero probability; 
(iii) Eo[hio] = 0, where Eo means that the expected value is taken for the popula- 
tion Io ; 

(iv) || Eo[hioh jo] || exists and is non-singular; 
(v) the moments Ep{hioh johxo] are all finite. 
(Here and throughout the remainder of the paper, the indices 7, 7, k, | have the 
range 1, ---,h.) If, in addition, 
(iii’) Eo{hio] can be differentiated under the integral sign; 
(iv’) the moments Eh ioh jo| are differentiable with respect to the 6’s; 
the H; will be said to be of class K’. 

We shall need the following lemma, which is very closely related to Theorem 1’ 
and Theorem 2 in [1] and which can be proved by the method of characteristic 
functions. 


1 Incorporated in this paper is a note presented by one of us (c.f. [2]) at a meeting of 
the Institute of Mathematical Statistics, December 27, 1938. 


225 





226 S. S. WILKS AND J. F. DALY 


Lemma: Let H; (2, 0) be of class K for each n, and put 


1 7 n n 7 
Bio = - Eo(H jo H jo] a Eo [hio hijo). 


Let || bio \| be the positive definite matrix satisfying the equation 
n | 
|| bo ||" = || Bijo || 
and write 
n )|;--1 
| b:j0 1 | = 


Then for any point 0) in R the functions 


a) " ao 2 in 


computed from IIo have a joint distribution which converges in large samples to 
normality, with the density function 


h 3 
(Qr) 20. 32, Pio 


Now whenever we are justified in assuming a definite functional form for 
F(a, 0), and have a set of functions ¢;(z, @) whose distribution under this last 
assumption is known and is independent of the @’s, as is the case in the limit for 
the functions (1), we can obtain, from a sample, information about the values of 
the @’s._ For, given any region S in the space of the functions ¢; , we can deter- 
mine the probability Po{gi0o C S} that in samples from IIo the point (gio , - - + , gxo) 
will fall in the region S, even though we do not know the population values 6p. 
Suppose, then, that we pick a region S such that Po{gio C S} > .95, and agree 
that each time we encounter such a problem we shall substitute the observed 
x’s into the g’s, and call the set of all points (6, , --- , 6,) for which ¢,(z, 6) CS 
the confidence region T. If this procedure is followed consistently, we can assert 
that the probability is more than .95 that the region 7 thus determined contains 
the true parameter point 4. 

Evidently the size of the confidence region, i.e., the accuracy with which it 
serves to locate the true parameter point 6 , depends upon our choice of the 
auxiliary functions g;. Consider now the case in which there is but one param- 
eter 6, and let g(x, 6) and ¢*(z, 0) be two functions with the same distribution 
D(u), where D(u) does not depend on 6. For the set S of the above discussion 
take the interval u<u< wd. Then 


° Br 

Pol go ce S} = Pol¢go — S} 
where a = .95, say. Given a set of observed 2’s, g(x, @) will map S into a 
confidence region 7, while g*(x, @) will map it into a confidence region T*. 
Both 7' and 7* may be expected to contain the true value 4 in 95% of the cases; 
hence a reasonable way to compare the size of 7’ with that of 7* is to compare the 





AN OPTIMUM PROPERTY OF CONFIDENCE REGIONS 227 


0 ee 
quantities ns (x, 00) and ¥ (, 60) ; for these derivatives give an indication of the 


amount of change one can ania in 6 without forcing ¢ or ¢* out of the interval S. 
The result obtained in [1] in this connection may now be stated as follows: 


Let H = — be of class K’, and let H* = > h(x, 0) be any other function of 


a=] 


class K’. Then in large samples from Ip both 


H 
= 4 27\ 3 
(nz 3 - ar} ) 


H* 
(nE[{h(a, 6)}])3 
are distributed almost normally with zero mean and unit variance. But the 


confidence regions obtained from ¢ will, on the average, be smaller than those 
from ¢*, in the sense that, for large samples the inequality 


Le > tela 


will hold (unless h(x, 0) = ¢ < log dF, in which case alone the inequality (2) 


g* = 


becomes an equality). 
Now let us return to the several-parameter case. One method of attack which 
suggests itself is to consider the jacobian determinant 


| Ogio 
a0; 


for this bears the same relation to the area of the region dS which maps into the 
region 


dT: 0. — 3d0 <0 < % + 3d0 


, -  Ogo. " s 
as does the derivative “* in the one parameter case. To this end, let us put 
06 
aL” . 
Li} (x, 0) = aa’ and for each n and for each 6 in R assume that 
i 
(a) Lio is defined for all x except perhaps on a set of probability 0; 
(c) Eo{[Lio] can be differentiated under the integral sign; 
(d) || Eo[LioLjo] || exists and is non-singular; 
(e) Eo[LjoL7o] is differentiable in the 6’s. 
Let H?(zx, 6) be any other set of functions satisfying the same conditions. Set 


E\\LioL jo] = nAij  EnlHionH jo] = nBi yo 








228 S. S. WILKS AND J. F. DALY 


and define the matrices 










|| ago ||’ = || Ati |] ll ao"? |] = |] agin 7 
}| bo ||? = || Béio || || bo"? || = |] bye 1 


Now consider the normalized functions 


h 
= 2,48 "1% 
h 


1%) = = 2,0" Hiv 






We then have 


1 aby dao ‘as 
3) a i 2 L ae. 7 
(3) n 20; -2 a0, n io + Dai "n a0, 


f. [1], pp. 171-2) 


nij 1 








and by virtue of assumptions (b) and (ce) it follows that (c. 


1 aL? 1 ~ nijp n n 
Bal} ri = —— Zz. ao” Eo[Ljo Lio] 


n 06; nN j=1 
















In similar fashion 


h 
AE | = —) > EH Ld 
n 00; nN j=1 


Consequently 

, | 1 dLio } 
(4) (—1)"| Ep |? || - = | Ajjo| 
and 


1 0% 


(5) (—1)' Balt _ 


1 n —} 1 n n 
3 = | Bio | ’ / HolH io Lio 
We can find a relation between these two determinants by going over to the 
matrix 


\| Bol[LioL jo] || || HolLioH Fol || 
\| ZolH io Lio) || || ZolH io H Fol || 


This matrix is positive definite unless there is a linear relation with constant 
coefficients, say z. (c,\L; + d;H;) = 0, which holds for all x’s except a set of zero 
probability; and in this event it is positive semidefinite. From the theory of 
compound matrices [3] we can then conclude that the matrix whose elements 
are the h-th order minors of M, arranged in lexicographic order on both row and 
column indices has the same property, so that 


| Eo|Lio 450] |. | Ey (AH ioH jo | Bo(Ljoll jo] | 





' 







tant 
zero 
‘y of 
ents 
and 


AN OPTIMUM PROPERTY OF CONFIDENCE REGIONS 


The relations (4) and (5) then imply that 


. 7 1 aL 1 at, 
(6) det Ey 2 a0; | > det Eo |} 00; | 


It may be observed that no use has been made of the assumption of linearity 
(i) in deriving (6). And since in the one parameter case the determinants have 
but one row and column, we see that in this case the result in [1] remains valid 
for functions of an even more general type than those of class K’. In order to 
give the inequality a statistical meaning it seems necessary, however, to require 
not only that H and L satisfy (a), --- (e) but also that in large samples =a Ht 
| we eal ; : 
and —— L; tend to be distributed independently of 6, with the same (though not 

V/n 
necessarily normal) distribution. 

For the case of several parameters the transition from the above determinants 
of expected values to the jacobian determinants requires further argument and 
further assumptions. To begin with, suppose that the L? and H? areof class K’, 
and that 
(vi) the moments Ep | Se Ohio 


06; 0; | are all finite, 


2 
with a corresponding condition on the variances and covariances of 


dF (x, %). Let us put 


ao 
00,00; © 


_ 1 0H% B,| 1 Hie} 
as oes 


n 06; n 00; 


dhio dhio 
ion st i= 
sie 06; , | 06; | 


The characteristic function of the Y;; is 


gn(ti, see, tha) = ¢n(t) = Eo [exp (i >» bij Y;;)] 


= {r[em(i Dun) 


Expanding the exponential in powers of the ¢’s and using (vi), we find that 


er(t) = {1 - o(2)¥ 


lim ¢gn(t) = 1 


no 


so that we have 


uniformly in every finite interval | t;;| < M. A basic theorem on sequences of 
characteristic functions [4] then guarantees that for any e > 0 


y 1 dH% 1 aH%y 
oa a Tae = 0 
- Pst 06; EF 06; | - + 





230 S. S. WILKS AND J. F. DALY 


‘ 1 dH io 
that is to say, that — = converges stochastically to its expected value. Under 
n 00; 
the assumptions of this paragraph the same type of reasoning may be used to 
ie 1 aL io ' 
show that the quantities — Hj), — Lio, and — —— all converge stochastically to 
n n 5 
their respective mean values. It will then follow from equation (3) that the 
a i 1 aLio 
functions — ——"" converge stochastically to the values Zy| — —“ |. In fact, it 
n 06; n 06; 
can be shown [5] that any polynomial in these functions must converge stochasti- 
cally to the same polynomial in their expected values. Hence, given any 


aLio 


e > 0, the probability that the determinant kb 


:, , [1 alin | es 
IIo from the determinant Fal = i by more than e can be made arbitrarily 
1 


small by taking n sufficiently large. Similarly, the determinant . 


_ [1 oH} 
converges stochastically to | Eo ? O41 io 
n 


differs in samples from 
i 


v 06; 


|” 06; 
. Thus, given any two positive num- 

a0, 

bers e, e’, we have the relation 


p,d 1 aLio* LaHio|” >1-é 
"lin a6; n 00; 


(where + indicates the absolute values of the determinants), provided n is 
sufficiently large. 

As in the one parameter case, the restrictions which have been put on the class 
of functions L and H are not entirely necessary. But it is difficult to replace 
them by any other set of conditions which are not obviously ad hoc. Let us 
now summarize the above results. 

THEOREM 1. If the functions Lj and H; satisfy the conditions (a), - - - (e), and if 

1 alt aH? 


(f) the functions — and = converge stochastically to their mean values; 
n 006; n 06; 


(g) the large sample distribution of the functions Li is the same as that of the 


1 
V/n 
. . ir a 
functions 7 Hj and is independent of the 60's; 
/n 


then in large samples the confidence regions derived from the L’s will almost certainly 
be smaller than those derived from the H’s, in the sense that 


Fn \+ pyr |+ 
lim Po ‘= > : 0H io | = | 
no n 006; n 006; 


unless there is linear dependence between the L’s and H’s. . 
THEOREM 2. The assumptions of Theorem 1 will be satisfied if the L; and H; 
are of class K’, are linearly independent, and satisfy vt). 





AN OPTIMUM PROPERTY OF CONFIDENCE REGIONS 


THEOREM 3. For the case of only one unknown parameter, the relation 


tL an} = Lan |} 
00, 00; 
aL” 


(euatity holding only in case Hy = c =) can be derived under assumptions (a), 
1 


.-+, (e) alone. Its interpretation in terms of smallest average confidence intervals 
depends, however, on whether or not (g) is satisfied. 
At first sight it may appear that the functions 


1 h = 
C= = bb” H:? 
y t Jn 2 7 
to which these theorems apply are too complicated to be of any practical use, 
involving as they do the square root of the inverse of the matrix 


| BY |] =~ || BUH? HI. 


But in employing the method of fiducial argument in the several parameter case 
there is no need to take the region S in the ¥ space to be an interval 


Vi< hi <i. 


Instead, we may take S to be the interior of the sphere 
h 
(7) Livi < 
This enables us to avoid the computation of the b"”; for 


h h 
do ni = : Db oN HF HE = 


N i,j, 


B™* H" Ht 


h 
7k=1 
where || B"”* || is the inverse of || Bj, || . ; 

To indicate more precisely how the function a may be used to determine 

i=1 


confidence regions for the parameter point 6, we note that if the distribution 
law of the y,,; tends to the form 


h 
(2r) 2¢** vi 


t=1 


h 
then >. y2,, which is identically equal to 2 >. B"“’H?H®, is approximately dis- 
if 


tributed according to the x” law with h degrees of freedom. We then have 


(8) p(} > B’ HiH} < x2) =a 


47 





232 S. S. WILKS AND J. F. DALY 


approximately, where xq is given by the relation 


1 [ 1 2\hh—1 —}x2 9 
ax) e- dx =a. 
arGh) Jo x)" e ™ dx’ =a 


The confidence region 7’ corresponding to a particular sample 0,: 21, 2, 


- , Za consists of those points in the 6 space for which — >, B"’H?H? < x2 
N “iyi 
when the z’s are substituted in the H’s. Since the region 7’ depends on the 
sample, it is essentially a random variable and the probability is a that 7’ will 
include the point 4 , that is, the point in the 6-space corresponding to the values 
of the @’s in the population. 


For example, suppose the population II is known to have the multinomial 
distribution law 


Th 


f(®o, +++ Xn 5 Poy +++ 5 Pr) = Po’ --+ Ph 


In this case each x has but two possible values, 0 and 1, and 
(9) bMctink einen 


The likelihood function for random samples 0, drawn from II has for its logarithm 


h 
L" = >> n, log Py 


v=0 


n 
where n, = >» Lva , Lv being the value of 2, for the a-th observation. Because 


a=1 


of (9) there are only h independent parameters, say p; (¢ = 1, ---,h). Thus 


iat. 2 
Pi Po 


Aj; = oii + i 
Pi Po 


where 6;; is unity if = 7 and Oif 7 #7. It is not necessary to compute the 
a”’’, for, as we have seen, 


: n\ 2 ] ‘ nijyn n 
he wi =—- do AM LIL} 


i=1 nN i,j=1 
And one can immediately verify that 
AN” = biiPi — PP; 
so that we have 


h 


l< ni a) ( =) 
10 ai = — bi; Pi — Pi Dj = — — = 
(10) Day ; 2X (6:2: — pipi) (~ = 


t=1 i D; Po 





AN OPTIMUM PROPERTY OF CONFIDENCE REGIONS 233 


Since in this case the L; satisfy the conditions of the lemma, we know that 


h 
2 . . . . . . 9 . 
> Wni is distributed, in large samples, approximately like x° with h degrees of 


i=] 
freedom. 

As a matter of fact, (10) is precisely the Pearson x° which is ordinarily used, 
in connection with the problem of deciding whether a sample supports the 
hypothesis that the population from which it has been drawn has specified values 
of the p’s. For, making use of the fact that 


h 
X (ni — npi) + (no — npo) = 0 


we find that 


h 
a me 2 AL 
-—= i;(nj — npj;) 
s » oa Pi 
h 
so that vi: reduces to 
i=1 


h h 
: 2 Abi (ni — npi)(n; — npj) = dX (n, — np,)”/npy 
which is the familiar form. Thus in particular the Pearson x’ is the best fiducial 
function of its type which can be formed from H’s satisfying Theorem 1, in the 
sense that for sufficiently large samples its constituent functions L? will almost 
certainly have a greater jacobian with respect to the parameters p; than will 
the corresponding H? computed from a set of H? independent of the L? . 

The confidence regions determined by (8) when the H? are replaced by the 
L? have an associated optimum property which may be stated as 


. . ] nijyyn n ° 
THEOREM 4: Let Ao denote the differential of — z. B°"H?H; with respect to the 6;, 
n 6.7 
evaluated at the true parameter point 0. Let Ao be the corresponding differential 


when the H? are replaced by the L; . Let the H; and L; satisfy conditions (i), 
(ii), --- , (vi) and let the mean value of the product of two, three or four factors 


taken from the set § hio , dh io 
06%. 


} be finite, no product containing more than two factors 


lin 


of the type Ohi Let similar assumptions hold for the set Lt, mel where lio = 
a0; v" & 
@ log d's 
006, 


(11) Ey ( a3") ~ (44i) >0 


The equality in (11) will hold for all differential vectors if and only if each hio is a 
linear function of the lio . 


Then if n is sufficiently large 





234 S. S. WILKS AND J. F. DALY 


This theorem can be proved in a straightforward manner by using the follow- 
ing characteristic functions 


h 
ae exp (| D uHh +i yu og 


? 


| exp @ > tihio +7 > Ui; ae 


t,7=1 


exp (4, Debt +i . Uij =) 


1,j7=1 0; 
h alin n 
[ex (iX tle +i Du Ui; ae) |, 
i=1 1,j=1 06; 
where u;; = uji. Now 
h h 
o nt Hi; n 
> —~ NM a0, + = 2, B - “Hi; dO, 
ijk=1 O60, jk= 
with a similar expression for Aj. The problem of mone the mean values 


1 : . y 
E @ a3) and E . 0 ) is a matter of evaluating a set of fourth order deriva- 


tives of gx and gy, at t; = 0, u;; = 0. 
If the appropriate differentiations are carried out it is found that 


E,(As) = an} ) Bro Crio Cio d0;d0; + o(*)| 


$,3,k,l 


E,(do ) = an] D Ajo d6;d0; + o( )| 


4.9K, 0 


where Aijo = E,{liol jol, Biw = Eglhiohil, Crio = Egthio , Lio]. Denoting 


Eo ( 4%) — Ky ¢ a3) by 6, we have 
n n 
S = ue Mio d6;d0; + o(1)} 
1,7 


M;50 || = || Aijo —_ 7 Be Ci ia ||. If the hin and Lio are linearly inde- 
k,l 
pendent then || M;;o || is a positive definite matrix and hence >. Mi; d6; d0; = 8' 
4,7 


say, will be non-negative and can vanish only when all dé; are zero. If each 
hyo is a linear combination of the lio and if the Ayo are linearly independent, then 
each lj is a linear combination of the hy. In this case it can be readily shown 
that every element in || Mj || will vanish, and hence 6’ = 0. 

In case some of the hio are linearly dependent on the 1,o , it can be shown that 
6’ is positive semidefinite, that is, there exists no differential vector for which 6’ 
is negative, although there will exist non-zero differential vectors for which 6 
is zero. 





AN OPTIMUM PROPERTY OF CONFIDENCE REGIONS 235 


It can be shown under the assumptions made in Theorem 4 that : (Ao — A3) 
actually converges stochastically to 46’, and thus if the hijo and ia linearly 
independent, the difference = (Aj. — Ab) converges stochastically to a positive 
number. Stated in another way: for sufficiently large samples, the square of 


; : ie iiniia a ae . . 
the differential change in - z. A" L; Lj, for a given change dé@; in the 6; from 
609 


the values 6:0, will almost certainly exceed that. of : 7 eee H;. The sta- 
v ij 

tistical interpretation of this result amounts to the following: by taking suffi- 

ciently large samples, we can make it as certain as we please that the confidence 


regions for locating 4 determined by using : > A"’L?L? in (8) will be smaller 


1,7 


than those determined by using 5 > B’’H?H? in (8). 
i 


REFERENCES 

{1] S. S. Wilks, ‘“‘Shortest Average Confidence Intervals from Large Samples,’’ Annals 
of Mathematical Statistics, vol. IX, No. 3 (1938), pp. 166-175. 

[2] S. S. Wilks, “Optimum Fiducial Regions for Simultaneous Estimation of Several 
Population Parameters from Large Samples,’’ (Abstract), Annals of Mathe- 
matical Statistics, vol. X, No. 1 (1939), p. 85. 

[3] C. C. MacDuffee, The Theory of Matrices, Ergebnisse Series, vol. II, No. 5, Berlin, 
Julius Springer (1933), p. 87. 

[4] H. Cramér, Random Variables and Probability Distributions, Cambridge Tracts in 
Mathematics and Mathematical Physics, No. 36, Cambridge University Press, 
1937. 

[5] J. L. Doob, ‘‘Limiting Distributions of Certain Statistics,’’ Annals of Mathematical 
Statistics, vol. VI, No. 3 (1935), p. 164. 


PRINCETON UNIVERSITY, PRINCETON, N. J. 





ON SOME PROPERTIES OF MULTIDIMENSIONAL DISTRIBUTIONS 


By J. LuKoMskt 


If, in a system of random variables 2, 22, --- , £,, some variables are con- 
nected by a functional (exact) dependence, the n-dimensional distribution 
law has a degenerated character. In other words, in this case the probability 
is not distributed over the whole n-dimensional space, but is concentrated on a 
manifold of a smaller number of dimensions which may be called the skeleton 
of the distribution. 

The character and the dimensionality of this manifold are determined by the 
character and the number of functional connections between the variables 
%1,%2,---,%,. If all these connections are linear, the skeleton will be a linear 
manifold (hyperplane). The investigation of the skeleton of distribution 
represents obviously an interest from the theoretical as well as from the practical 
point of view. 

In the present paper we establish some criteria which enable us to determine, 
for any distribution possessing finite moments of the first and second order, the 
linear skeleton and to find the variations of the dimensionality of this manifold 
when the variables are subjected to a linear transformation.’ 

We also apply the obtained results to the case of a multidimensional normal 
distribution (generalized by H. Cramér to the case of linear dependence between 
variables). 


§1 
Let 


(1) U1, %2,°°* Xn 


be a system of random variables defined in the n-dimensional euclidean space 
R, by the multidimensional distribution function F(a, x2, +--+, 2%n). The 
function F is defined on all Borel sets in R,. We assume the existence of the 
following moments: 


ne) =f f--f xidd -.-- dF(a,, %2,--- 


E(x;:x;) = ie / a,uj;dd --- dF(a,, %2, +++ tn) = pi; 


where the integrals are to be understood in the sense of Lebesgue-Radon. 


1The questions of degeneracy of a statistical distribution were for the first time 
considered—from a somewhat different point of view—by R. Frisch [1}. 
236 























ar 
cal 
ne, 
the 
old 


nal 
en 


ace 
he 
the 


time 


MULTIDIMENSIONAL DISTRIBUTIONS 237 





If the variables 2; , 22, --- , £, are connected by a relation of the form C,2,; + 
Cote +--+ + Crtn= 0 (ZC* ¥ 0) (are linearly dependent), we call this relation 
a linear bond of the distribution F. 

We shall call a system of linear bond of the distribution F complete, if all 
bonds of the system are linearly independent and every linear bond of the 
distribution depends linearly on the bonds of the system. 

By the (linear) decrement of the distribution F (we denote it by k(F) or simply 
k) we understand the number of bonds in a complete system. We may, cor- 
respondingly, call the difference between the number of variables and the 
decrement of the distribution the (linear) rank of the distribution, or the dimen- 
sionality of the linear skeleton. 

The decrement (rank) is given by the following 

THEorEM 1.” The decrement (rank) of the distribution F is equal to the decre- 
ment’ (rank) of the matrix 


I| mas || 1,j=1,2,---n 

of the moments of the second order of this distribution; that is 
(2) kK(F) = (|| wii \l), 

Proor. Consider the form 


(3) v = tha + bee +--+ $bdn 


where t; , t2, --- , ¢, are arbitrary real numbers, not all equal to zero. 








Let 


Q = Ev’) = I] vee (tai + tere + +++ + tran)’ dd --- 
Rn 


(4) --- AF (x, 22, coe Zn) 


tn) = a til; mij. 


‘j=1 






= Pings i aj;ajdd --- dF(a, 12, «+ 
i,j=1 Rn 


Q’ is a non-negative quadratic form in the variables ti, &, ---,t,. The 
system of values t, , 2, --- ,t», for which the expression (3) becomes zero is a 
double point of the form Q’. 

The coordinates of the double point can be found from the system of homo- 
geneous equations: 





Muti + wile +--+ + Minkn 
Moti + poole +--+ + Montr = 


CCE CHOBE E OC EOROEERE OC EEE OOD O'S 


Mrili + Mnole + pare + Manta — 














2 This theorem was proved by a different method by R. Frisch [1]. 
3 By the decrement of a (rectangular) matrix we call, after B. Kagan, the difference 
between the number of its rows and its rank. 


238 J. LUKOMSKI 


It is, however, known that the number of the independent double points of the 
form, Q’, i.e. the number of linearly independent untrivial solutions of the 
system (5) is equal to the decrement of the matrix || y;; || , 27,7 = 1,2, --- n. 

Consequently, there exist only k(|| 4; ||) independent linear connections 
between the variables x; , 2, --- , Xn , Which proves the theorem. 

Hence it follows that the variables 2, 22, --- , X, are linearly independent 
(k(F) = 0) if and only if the form Q’ is positive definite and, consequently, the 
discriminant | u;;| of the form is positive. 

The following two theorems may be used for determination of a complete 
system of linear bonds. The first of them is a special case of the second, but is 
stated separately in order to simplify the proof. 

THEOREM 2. If k(F) = 1, we obtain the linear bond of the distribution by re- 
placing in the determinant on the left hand side of the equation 


|Min Miz °** Min 
Mer = Me2 - 
(6) 


Mni PMn2°*° 


the elements of one (arbitrary) row by 2, %2, +--+ , Xn respectively. 
For instance, replacing the first row, we have 


Mni BMn2°**° Mnn 


Proor. Since the decrement of the matrix || y:; ||, 7,7 = 1, 2, --- mis equal 
to 1, for the unique nontrivial independent solution of the system (5) (4, 
ts, --- , tn) may be taken, as we know, the system of algebraical supplements 
of the elements of any row of the determinant | y;; |, 2,7 = 1,2,--- mn. (Among 
the algebraical supplements of elements of each row there is at least one different 
from zero, since the algebraical supplements of corresponding elements of any 
pair of rows are proportional to each other.) 

Hence, since t)%; + fete + --- + tnt, = 0, the theorem follows. 

THEOREM 3. Jf k(F) > 0, we obtain a complete system of linear bonds of the 
distribution F replacing in each of the k equations 


Mei = Mkk4+l °° * | Mkn | 


(8) Mk+1,6 Mk+1,k+1 °°° roan oii 0, i= 1, 2, a 


Mni Mn,k+1 


one (arbitrary) row of the determinant respectively by x:, Tex1,--+ ,2n, Where 
Lk41,°°* , tn are chosen in such a way that 


| Mk+1,k4+1 °° ° Me+1.n| 





MULTIDIMENSIONAL DISTRIBUTIONS 


Replacing, for example, the first rows, we obtain: 


Vett °°° Ln | 
Mk+1l 0 Mk+l,k+1 °° * Mk+iyn| __ 


Mk+1,k4+1 °° * Mktin) __ 


Mktik Metik+l *** Metin | 


Mnk Mn,kt+l °°* Mann | 

Proor. The theorem is already proved for k(F) = 1 (Theorem 2). We have 
to prove it for k(F) > 1. 

Let us in the first place show that the matrix || y:; ||, 7,7 = 1, 2, ---  pos- 
sesses at least one positive chief algebraical supplement of the order n — k. 

In fact, in the system of n variables 7, x2, --- , Zn, connected by k inde- 
pendent linear relations there must exist a subsystem of n — k linearly inde- 
pendent variables. Let these variables be 2441, tiy2,---,2%n-. The deter- 
minant of the moments of the second order of this subsystem: | ui; |, 74,7 = 
k+1,---,% is different from zero and, by the property of Gramm’s determi- 
nants, is positive. Further, each of the subsystems 2; , X41, --+ , 22, is sub- 
jected to the distribution law F;(a; , Xe41, +--+ , 2n) With the decrement k; = 1 
and, consequently, by Theorem 2, the relations (9) are satisfied. (Arguing 
as before we find that any (not necessarily the first) row in each of the deter- 
minants in (8) may be replaced by 2;, 2x41, «++ n). 

In order to show the independence of the relations (9), write the system (9) 
in the form: 


(9’) LD Cia; = 0, -++ yk 


j=1 
and consider the matrix of its coefficients: 
(Cy O --+ O Crear Craze --> 
(10) s+ O Conia Conse -- 
Cre Creri Crepe s+ 
The matrices (10) have the rank k, since the determinant of order k 


Cy 





240 J. LUKOMSKI 


belonging to the matrix, is positive; this follows from 
Ca = Coe =... = Cr = | wa; | > Q, t,J) =k -b Bs ere 


Thus the independence of the relations (9) is proved and the theorem is 
established. 


§2 


In this section we consider the question of the variation of decrement of the 
distribution in the case when the variables are subjected to a linear trans- 
formation. 

Let 21, %2,--- , 2», be a system (1) of random variables and 


= Ant, + Apt. +--+ + Aintn 

= Unt, + Arete +--+ + AenEn 

= AmiXi + Ame, + +++ + AmnXn 
a system of linear forms in the variables (1). 

The distribution function of the variables uw, ue, --- , Um we denote by F;, 
the decrement of the distribution by k(F;), or, shorter, by hi . 

The two systems of equations (11) and (9) form together the system: 

, + Aintn 
: + Amn&n 
+ Am+1i,ntn 
= Am+k,1%1 + Am+k 202 + es + Am+k,nUn 
where the last k equations represent, in new notation, the equations (9). 

We call the matrix of the coefficients of the variables in the system (12): 
| ai; || 72 = 1,2,---,m+k;3j = 1, 2,---,n, the elongated matriz of the trans- 
formation. 

We prove the following 


THeoreEM 4. The decrement of the distribution F\(u1, ue, +++ Um) ts equal to the 
decrement of the elongated matrix of the transformation. 


si k(Fi) = k(\| ai; ||). : 7 Bees ,m +k 


77 ym 





MULTIDIMENSIONAL DISTRIBUTIONS 241 
Proor. Consider a system of forms in arbitrary linearly independent param- 
eters 1, &,---,&n: 
Vy = Anks + AEs + +++ + Ainkn 
dak) + Anke »++ + donkn 


_ Omié1 + Omoke 
= Omiij1b1 + Omiiée + - 


— Om+k,1&1 + Om+k 2&2 + ess + On+k,nta 


such that the matrix of the system (14) coincides with the elongated matrix of 
the transformation. 


(1 Um+1 = 0, VUm+2 = 0, a Um+k = 0 


the system (14) reduces to the system (12). 

If the decrement of the matrix of the system is equal to s, there exist only 
m + k — s linearly independent forms v; , and each of the remaining s forms is 
a linear combination of the first. 

By Steinitz’s theorem we can always include in a subsystem of independent 
forms the forms vm41, --- , Um+« (since these forms are independent). 

Denoting all forms of the subsystem by 0.41, «++ , Um, Umii, °°: 5 Umik, let 
us write the s relations connecting each of the remaining forms with the forms 
of our subsystem in the form: 


giv + 91,s+10s41 + ---+ JimUm + Ji,m41Um41 +++. + Ji,mtk¥Umik = 


(16) J22V2 + 9J2,8+1Vs41 + ee + J2mU m + J2,m+1U0m+1 + Py + J2,mik¥Umik = 


JssVs + Js,s+1Us4i + oe + JsmUm + Js,mitUm+1 + ws + Js,mikU¥m4 = 
where gi ’ 922 "eae » Oss x Q. 


Assigning to the variables in these equations the values (15) we clearly 
obtain s linear relations between the variables wu, U2, --- , Um 


Jia + Giys41Usea + +++ + GimUm = 
J22U2 + 92,84+1Us41 tess + Jg2mUm = 
Jssls + Js,s41Uey1 Fees + YsmUm = 


The equations (17) are linearly independent, since the matrix of the sys- 
tem (17) 





J. LUKOMSKI 


0 Ji,s+i . 
O Gest --* 


of the order s, which is different from zero. 

We proceed now to prove that there exists no other linear relation between 
the variables w; linearly independent of the relations (17). 

From the equations (17) the variables uw, ue, --- , us may be determined as 
linear combinations of the variables u,41, --- , Um (we suppose that m > sg, 
since for m = s the proposition under consideration is trivial). 

It is thus to be proved that the variables ws;1, --- , Wm are linearly inde- 
pendent (since every new linear relation between the variables w , v2, --+ , Um, 
independent of (17) must, after corresponding substitutions, lead to a linear 
relation between Usi1, --+ , Um). 

In the equations (12) the linearly independent variables ws41, --- , Um+x are 
linear forms in n linearly independent parameters &, £&, +--+, &n. 

We may instead of the &, &,---, & take for the system of linearly inde- 
pendent parameters Umi, --+ ,Umik, €41, °°: , &n (changing the indices of the 
£ in an appropriate manner), defining £, f,---,& by the system of 
equations 


= Om+1,181 + as + Am+i,nén 


= Om+k 1&1 + ne + Gusttnke 


which is always possible, since the forms vm 1, +--+ , YUm4x are independent. 
Substituting the expressions obtained for the £,, &,---, &, into the forms 
Veit, °** Um, We find 


= Ps41(Umi1, °° Umik) + WerilEeqa, «°° & 


(18) 
— Om(Ums1 , a Um+k) + Wm( Ens , ++ & 


where ¢ and y are linear forms in the corresponding arguments. 
The variables v.43; , --- , ¥m remain, of course, independent. 


4 The indices of the £ adequately chosen. 








ear 


are 


de- 
the 
of 


rms 









MULTIDIMENSIONAL DISTRIBUTIONS 


Performing in the equations (18) the substitution (15), we obtain: 


ty = WerilEnsi , reve En) 


1 86 hn RR Reet renee 
Um = Vm( Ens , Po é,). 
If there exists a linear dependence between the ws41, --- ,Um, we can find 
Q@si1,) °** »@m, not all equal to zero, such that 





(20) Os41Us4+1 + = + Anim = 0. 








Multiplying the equations (18) by the coefficients a,4;, --- , am respectively, 
and adding, we obtain, by virtue of (19) and (20) 









410541 + bia + amin = Os41P9841(Umi1, = -Um+k) + eS + AmPm(Um+1; eee Um+k) 
ie. the variables vs,1, --- ,Um+% are linearly dependent, which contradicts the 
assumption. 


The required proposition is thus proved. 

It follows that the s equations (17) form a complete system of bonds of the 
distribution F; , which proves our theorem. 

The moments of the second order of the distribution F; are connected with 
the moments of the distribution F by the following formulae 


E(u; uj) = E (2 aut)( 3 a2.) 
r=1 s=1 


D, Gir QjsE(tr2s) = D> Gir Ais prs (¢,7 = 1,2, ---m). 


r,s=l1 r,s=l1 


§3 


Let the normal law of distribution G (generalized by H. Cramér) be given 
by its multidimensional characteristic function [2], [3]: 


Slt, te, +++ f,) = I] +f eittitittazet---ttnta) 7 -++ dG(xz1, te, von gd 
Rn 
by2 


Vij 


(21) 


















—-sers 2 2 


(22) 
=e” 


where Q° = Zz. Cratrts (Crs = Cer) is a non-negative quadratic form in the real 
r,s=1 
variables t;, to, ---,t,. (The integrals, as above, to be understood in the 
sense of Lebesgue-Radon.) 
As is easily seen, the coefficients c,, are the moments of the second order of 


the distribution G for which 


o| af 
rs = E I, Xs) = i = Crs. 
- ( ) ie Lt 20 ; 








If Q° is positive definite, we have a proper normal distribution. 
If Q’ is non-negative, the distribution G possesses a positive decrement. 


244 J. LUKOMSKI 


The decrement and the linear bonds of the distribution may be determined 
from the matrix of the coefficients || c,, || 7, s = 1, 2, --- , m on ground of the 
general theorems of §1. 

Let, as before, 


= Ant + Apt. +--+ + Aintn 
(11) = AgyX, + Ante + +++ + Aantn 


= AmiX1 + Amore + --- 


be a system of linear forms in the variables 21, 22, ---,2,. We shall prove 
the following 


THEOREM 5. The variables uj, U2, ---,Um are subject to the generalized 
normal distribution law the decrement of which is equal to the decrement of the 
elongated matrix of the transformation 


Qm+1,1 Am+1,2 : 


Am+k,1 Om+k,2 a Am+k,n 


Proor. Consider the characteristic function of the distribution G,(w, 
Up, +++ Um), 


(23) filtr, te, +++ tm) = [| ose | gi terttaest-- tate) og... Pilon, te -- >, Med. 
Rm 


Performing in this expression the substitution (11), we obtain 


fii, te, ene (3 


n n 


n 
(Ct, D ajjzjtte © agjyzjt---t+tm FZ 2) 
a I] a i & 1 Pe jzj 2 jo 78 iI m Fe ed 7 dd -e 

Rn 

n n n 
y 
oe dG § Do ang 2;, Dy aigtj, +++ DY Omi; 

j=1 j=1 j=1 


m ™m m 
> ys y ys 
t( 2) & apjtptze & apotpt-+++z Upnt 
( : meres Pp Pat p2*p Oat pntp) dd 
eee é 
Rn 
Y 
»++ dG(a1, v2, +++ tn). 


(dd .-- dG(x,, 22, --- Xn) in the expression (24) does not, in general, coincide 
with dd -.-- dG(a,, 22, --- 2n) in the expression (22)). 
Taking into account (22), we obtain 


hi = et: 













MULTIDIMENSIONAL DISTRIBUTIONS 








Qi = Le {en (= aly) (= Ags uy} 
r,s=1 p=1 g=1 


m 


(26) = x {em a Apr Vas ite} 


p.q= 


m 


= Net Ze ayaa} — z tntaUpq- 


?P.q= r,s=1 p,.q=1 


as ° ° ‘ . 

Qi is a non-negative quadratic form in ¢,, tf, --- ,tm, the coefficients of 
which coincide with the moments of the second order of the distribution 
Gi(u , Ue, +++ Um). 













Consequently, the distribution G, is a generalized normal distribution. 

By Theorem 4 the decrement of the distribution G; is equal to the decrement 
of the matrix || a,,|| p = 1, 2,---,m+k;r = 1, 2,---,m, the last k rows 
of which consist of the coefficients of the complete system of linear bonds of 
the distribution G. 

Let now 2, %2,---,2, be a system of random variables subjected to a 
proper Gaussian law. The density function of the distribution of the system is 


1 n 2 52; 
veve tL MEME Es Hoe 
@n)* I Via VR i A 


where 


eeeereeeeeeeee | 


‘ . Maj 
R;; are the algebraical supplements in R, ri; = - 

V wii Mii 
definite quadratic form in the variables 2, 72, --- ,2n. 


Again let ws, Ue,---,U%m be a system of linear forms in the variables 
M1, %2,°++,2n 


, and x is a positive 











= Ay, + Ay%. +--+ + Ainkn 
(11) Uz = AnyX, + Aol. +--+ + AenEn 


26 S62 se 62 46 s.6 £24 © Se 6469 ODOM SOO 






= Amit + AmeX2 + -- 


Then from Theorem 5 follows the 


246 J. LUKOMSKI 


Coro.tuaRy. The random variables uw, U2, ---, Um are subject to the m- 
dimensional properly normal distribution law of Gauss if and only if the matrix 


Qi = G2 *** Ain 
d21 eee 


of the system of forms (11) has the rank m. 


REFE RENCES 


[1] R. Frisch, ‘‘Correlation and Scatter in Statistical Variables,’ Nordsk Statistisk Tidskrift, 
vol. 1(1929), 36-102. 


[2] H. Cramér, ‘‘Uber eine Eigenschaft der normalen Verteilungsfunktion,’’ Math. Zeitschr., 
vol. 41(1936), 


[3] H. Cramér, Random Variables and Probability Distributions, Cambridge 1937. 


Moscow, U.S.S. R. 










ON A CLASS OF DISTRIBUTIONS THAT APPROACH THE NORMAL 
DISTRIBUTION FUNCTION’ 


By GrorceE B. Dantzic 















1. Formulation of the Problem. An important property of a sequence of 
binomial coefficients is that, when suitably normalized and transformed, it con- 
verges to the normal distribution.” The object of this paper is to exhibit a 
large class of other sequences which also possess this property. 

The Pascal recurrence formula may be taken as the defining property of the 
binomial coefficients. Let the combination of n things taken x at a time be 


denoted by ("). If we set fr(x) = ayr.(”) for 0 < x < nandf,(x) = 0 for 
zt < 0orz > n, then f,(x) is defined for all integers x. With this notation 
Pascal’s recurrence formula, (") = r 7 " + « 7 i) may be written 


z z-—1 


(1) Sn(x) — 5 [fn—1(x) + frai(x — 1)], 


where this new form is valid for all integers x extending from —» to +. 
In order to generalize, we may consider a sequence of distributions fi(z), 


fo(x), --- ,fn(x), --- each defined in terms of the preceding one by means of the 
recurrence formula 
















1 


(2) fr(x) = 5 


[fna(a — 0) +fna(a — 1) + fni(2 — 2) +--+» +fra(w —a,)], 






where the x are integers, and a, is a positive integer which may change in value 
from one distribution to the next. The problem is to find conditions under 
which f,(x), in normalized form, approaches the normal distribution. The 
normalization of f,(x) is effected by the affine transformation 















(3) ua t=, = 9.u) = fal(z), 


on 





1 Presented November 21, 1938 before a joint meeting of the Columbia Mathematics 
Club and the Statistical Seminar of the Graduate School of the Department of Agriculture; 
also December 10, 1938 before a meeting of the American Mathematical Association at the 
University of Maryland. 

2 Due to DeMoivre, 1731. By a variable distribution approaching the normal dis- 
tribution, we mean that the integral under the variable distribution between any two 
limits approaches the corresponding integral under the normal curve. 


247 


248 GEORGE B. DANTZIG 


where Z, and o, are the mean and standard deviation of the distribution f,(z), 
The normal (cumulative) distribution function is taken in the standard form 


u 
(4) g(u) = —- ee dz. 
V 29 J 
The theorem whose proof forms the theme of this paper may be stated as 
follows: 
THEOREM: A necessary and sufficient condition that ¢,(u) — o(u) asn > x 
is that T = 0, where 


n n 3 
(5) r=Lim ) 7 /(% 7) 5 44, = ai + 2a. 
i=2 


no i=2 


2. Liapounoff Condition; the general case. The recurrence formula (2) isa 
special case of the most general linear recurrence formula 


+00 
(6) fn(zx) = ps gn(t)fnalax oa 2), 
where g,(z) are a given set of weight functions generating the sequence fi (2), 
fo(x), --- ,fn(z), --- . We may form the recurrence formula (2) by setting 
1 


gn(?) — dy, + 1 if 0 < a 4 Gn, 


gn(t) = 0 if ‘<O@a@rt>a,. 
Let F,(t) = > f(z) express’ the probability that a variable 2, < t, where the 
z<t 
distribution function of x, is defined as f(a”); and in a similar manner let the 
probability that a variable s, < ¢ be given by G,(t) = > g(x). By summing 


z<t 
f,(x) for all x less than t, we obtain 


(8) FO = % Fatt doli) = [Pat - dae), 


1=— 0 


where we have replaced the summation by a Stieltjes Integral. In the latter 
form the integral gives, in general, the probability that the swum of two inde- 
pendent variables x, and s, is less than t. From the above equation we see 
that the probability that z,_1 + s, < tis the same as that of x, < t, so that 
we may set 2, = Yn1 + 8,. By iteration one obtains 


(9) In = 3 +S +--- + & 


for all n. Thus we have established that if a distribution function of a variable 
s, 18 defined as g,.(x), then the distribution function of the sum 8, + 8s + +--+ +8, = 
Ln 18 fn(x). 


3 The summation extends over all values z less than ¢. 





















S 


o 


ster 
de- 
see 
hat 


A CLASS OF DISTRIBUTIONS 


249 





The limit of the distribution function of the sum of n independent variables 
as n — © has been considered by Laplace, Liapounoff, Lindeberg, and others. 
We shall make use of a sufficient condition given by Liapounoff that the nor- 
malized distribution function of x, approaches ¢(u). 

LAPLACE-LIAPOUNOFF THEOREM: A sufficient condition for the normalized 
distribution function of the sum of n independent variables s,, 8%, --- 
approach the normal distribution function with increasing n is T’ 


M,(1) + M,(2) > see + M,(n) 
[M2(1) + M2(2) + --- + M2(n)}*’ 
and where M2(k) and M,(k) are defined as the second and fourth moments of s% 
whose distribution ts gj.(x). 

Thus we have shown that if a sequence of distributions f,() is defined by the 
general linear recurrence formula (6), 


, 8n to 
0, where 


(10) lr’ = Lim 


no 


fle) = ¥ gal)-frae —9, 


i=—o 


then a sufficient condition that ¢,(u) — g(u) as n > @ is given by I’ = 0, 
where ¢,(u) is the normalized form of f,(u). 












3. Sufficiency of the Condition [ = 0. We may simplify the condition 
I’ = 0 for the more restricted case of a sequence of distributions defined by the 
recurrence formula (2). In general, the second and fourth moments of g,(x) 
are given by 


M2(k) = = g(x) (x — &)”, 
(11) ine 
Milk) = Dd gilz)(a — 3)', 






z=—0 











where §;, is the mean value of the distribution. Equations (7) give the special 
values of fi.(2); substituting these values in (11), and remembering the Bernoulli 
summation by which 1” + 2” + 3” + -.-- + n” may be expressed as a poly- 
nomial in n of degree p + 1, we obtain 


















ae 


o 


[ 


4J. V. Uspensky, Introduction to Mathematical Probability (McGraw-Hill, 1937), pages 
284-292; the theorem is proved there by the method of characteristic functions. 


4 


ai. + 2a, 


2 


1 
15 





[ 


ai; + 2a; 


4 


ee 4 ; YY shes = 1 
M y= = ‘ = — = — ‘ 
~~ Baek (= 2) aL 4 _ 
(12) ae | ( 1 ) 
M.(k) = oe 
14(k) —a+1 x g % 


|- 






1 


5 


i 


2 
k 


15 


Yk; 


250 GEORGE B. DANTZIG 


whence by substitution in (10), I’ becomes 


: 2 j- : > 1: + Malt) 
(13) 3°15 


44 + a | 


3m 


n 
Since a; > 1, y: > 3/4, and thus vr * asn— co, we may reduce I” in 


=< 


the limit to 


(14) * hie = vi; iz “|. 


5 n—0co {7=Q 


Since I’ = 21, the Liapounoff condition T’ = 0 for normality becomes 
by (5), T = 0. 


4. Necessity of the Condition f = 0. A necessary condition for normality 
can be found by noting that if ¢,(u) approaches ¢g(u), then the moments of 
yn(u) must approach the corresponding moments of g(u).” Letting ys(n) be 
the 4th moment of ¢,(u) and ys the corresponding moment of the normal curve, 
a necessary. condition is that usa(n) > ws as n > ©, and wy = 3. The 4th 
moment of ¢,(u) may be expressed simply in terms of the moment of f,(2). 
If the symbol EF stands for expected value, the second and fourth moments of 
fx(x) are E(x, — &,)° and E(x, — &,)* respectively, and the relationship is then 


sis B| > i as 5) | 
(15) us(n) = E(xn — Fn) — i=1 


eT aE = 301 


Kxpanding the sums by the multinomial theorem and taking the expected value 
of each term we obtain 


(16) Ea, —#,)° = DB ~ ar +3 > E(s; — 3) E(s; — 3) = - WAOF 


t<j=l 


where M,2(z) is the second moment of gi(x). In a similar manner we have 


E(, — #,)' = ; M,(i) + 6 = M2(i)M2(j) 
ai 4 a Do 
he M,(i) +3 |= Maco | — 3 mu M3(i); 


5 Uspensky, loc. cit., pages 383-388. 



















S 


lue 


A CLASS OF DISTRIBUTIONS 


whence 






> M,(i) — 3 = M3(i) 


(18) ain « 3 + 2, ._—_— 
i=l 
Since a necessary condition for normality is that Lim y(n) — wy = 3, the 


fraction in the above equation must in the limit approach zero. Substituting 
M.(i) = 37; and M,(i) = 47; — svi, we find that this ratio reduces imme- 
diately in the limit to the condition T = 0. 





5. Application to the Distribution of Inversions. A frequency table may be 
set up for the number of permutations of n objects that give rise to a fixed 
number of inversions. Three objects marked 1, 2, 3 may be permuted in 
6 ways: 


(123), (132), (213), (231), (312), (321). 



















If (123) is taken as standard -position, the number of inversions associated 
with the above set to bring each one into standard position are respectively 
0,1, 1, 2, 2,3. Thus we pass from (321) to (123) by the following three inver- 
sions or adjacent interchanges: (312), (132), (123). Among the six permuta- 
tions there is one giving rise to 0 inversions, two having 1 inversion, two having 
2 inversions, and one having 3 inversions. 

The distribution of inversions finds its application in a test of significance. 
The standard position is taken as a hypothesis of rank order, and the difference 
between an observed set of ranks and the hypothetical one is measured by the 
number of inversions. The distribution may then be used for finding the 
probability of obtaining by chance the number of inversions found, or less. 
For a moderate number of ranks (six or more), the distribution of inversions may 
be approximated by a normal curve. We shall show that as the number of 
ranks is increased, the normalized distribution of inversions approaches the 
normal distribution. The distribution of inversions of 1, 2, 3, 4, objects will 
be found in the table below. 





Inversions: x 


Lefi(a) 


I | | | 
1-2-fo(x) i 1 | | 
1-2-3-f5(z) | 2 2 1 | 
1.2.3-4-fi(z) 1 | | 


252 GEORGE B. DANTZIG 


By induction one may show that the following relationships hold between 
successive distributions: 


fo(x) = 3ifi(a — 0) + falx — 1), 


fa(x) = alfe(a — 0) + fox — 1) + fo(x — 2)], 
(19): 


faa) = A Ufale — 0) + foal — 1) 


n 


+ fa-o(a — 2) + --- + faole — n + 1)). 
Since this satisfies the basic recurrence formula (2), where a, = n — 1, we may 
find out whether the normalized distributions of inversions approaches ¢(u). 


n n 2 
Withy, =n? — 1 the condition r = 0 becomes Lim >, (7? — 1)? | (a — 1)| ; 
i=2 


nc i=2 

The numerator sums to a polynomial of the 5th degree in n, while the brackets 
of the denominator sums to a 3d degree polynomial, which after squaring is of 
the 6th degree; so that as n — ~ we have in the limit f = 0. Thus the nor- 
malized distribution function of the inversions of n objects approaches ¢(u) 
asn— o, 

Equations (12) and (16) permit us to find the mean and standard deviation 
of the distribution of the inversions of n objects: 


En = in(n — 1), 
(20) 


2 


on a n(n — 1)(2n + 5). 
12 


The sequence of binomial coefficients, and the distribution of inversions are 
examples of sequences that satisfy recurrence relation (2); it should be noted 
that their respective values of yn, (yn = 3/4 or yn = n’ — 1), may be considered 
as bounded between two polynomials of the same degree in n. Whenever this 
is true the condition = 0 will hold and ¢,(u) will approach g(u). On the 
other hand, if for example, y, = r”, then T && 0 and ¢,(u) does not approach ¢(u). 


6. Smoothing Formulas. The general recurrence formula (6), 


fle) = DX gdfale — 9, 


t=—00 


may be considered as a linear smoothing formula. For example, we may obtain 
the usual three point smoothing formula based on binomial coefficients for 
smoothing a distribution f;(2) into fo(x) by setting in the above equation n = 2, 


go(t) = 3 & .) for —1 <i < +1, and g(z) = Ofor? < —lorz > +1. Thus 


(21) fo(x) = alfa + 1) + 2fa(z) + fila — 1)). 








-—™ ue 


us 


A CLASS OF DISTRIBUTIONS 253 


From considerations found in Section 2, we see that if a variable x; has for 
distribution fi(z) and a variable so has for distribution go(x), then their swum 
so + 21 has for distribution function the smoothed distribution fo(x). From 
this point of view, the smoothed distribution fo(x), obtained by applying a 
linear smoothing formula, is a “cross”? between the original unsmoothed distri- 
bution fi(z) and the artificial weight distribution g2(z). 

Often a smoothing formula is used several times; first on the original distri- 
bution, then on the smoothed distribution, and then sometimes on the smoothed 
smoothed distribution. Jf a linear smoothing formula is thus iterated 1, 2, 
3,--:,,--+ times, the sequence of smoothed distributions obtained upon nor- 
malization approaches g(u). This may easily be demonstrated by showing that 
Liapounoff’s condition for normality, T’ = 0, is satisfied. Since in this case 
the weight distribution g,(z) is the same for all n > 2, the corresponding moments 


of these distributions must all be equal; thus we may write M,(n) = M,(2) 
and M:(n) = M,(2) where n > 2. Substituting in (10), we obtain for IT’ 
(22) Tl’ = Lim M;(1) + (n i 1)M,(2) 


nse (M2(1) + (n — 1)M(2))2” 


where M.2(1) and M,(1) are the 2d and 4th moments of the unsmoothed distri- 
bution f:(x). The mean value #, and the standard deviation o, of the distribu- 
tion f,(x) formed by iterating a smoothing formula n — 1 times are easily shown 
to be 


In I + (n ay 1)8y , 

.=¢+ (n — 1)¢i, , 

where Z, and o; are the mean and standard deviations of the original unsmoothed 
distribution, and where §,, and o~ are the mean and standard deviation of the 
weight distribution go(7). 

The linear smoothing formula is used in practical work to smooth data. 
Successive application of one or many such linear formulas will usually smooth 
any set of values to the normal curve of error. The above section serves as a 
warning of what is introduced by the use of such methods. 

It is a pleasure to acknowledge the helpful criticisms and advice of Dr. W. E. 
Deming in the preparation of the manuscript. 


(23) 


WasuHIncTon, D. C. 





THE LENGTH OF THE CYCLES WHICH RESULT FROM THE 
GRADUATION OF CHANCE ELEMENTS 


By Epwarp L. Dopp 


1. Introduction. Eugen Slutzky' found that under certain conditions 
repeated summations of chance elements lead to a sinusoidal configuration. 
Generalizations were made by V. Romanovsky.” A more recent paper by 
Slutzky* has appeared, summarizing his original Russian memoir, and making 
extensions. Contributions to this subject have also been made by H. E. Jones," 
E. J. Moulton,’ and A. Wald.° 

Readers who wish to get into touch with recent literature on periodicity are 
referred to two excellent books, that of Karl Stumpff’ with bibliography of 319 
references, and that of Herman Wold,° with bibliography of nearly 70 references. 

In this paper, I deal with the wavy configuration resulting from a single 
application of a specified graduation formula. For this purpose, only linear 
operators are considered. For actual graduation it is customary to require 
that the sum of the coefficients or ‘“‘weights’” be equal to unity. But for the 
present purpose, this requirement is irrelevant. For example, summing and 
averaging are here essentially identical. The graduation formula considered 


may or may not be the combination of simple summations or averages. Indeed, 
formulas preferred by actuaries and statisticians include terms with negative 
coefficients; and thus involve an operation other than addition. F. R. Mac- 


1 Eugen Slutzky, ‘‘Sur un théoréme limite relatif aux series des quantités éventuelles.” 
Comptes Rendus, Vol. 185 (1927) pp. 169-171. 

2V. Romanovsky, ‘‘Généralisations d’un théoreéme de M. E. Slutzky.’’ Comptes 
Rendus, Vol. 192(1931) pp. 718-721. ‘‘Sur la loi sinusoidale limite.’’ Rendiconto Circolo 
Mathematico di Palermo, Vol. 56 (1932) pp. 82-111. ‘“‘Sur une généralisation de la loi 
sinusoidale limite.’’ Ibid., Vol. 57 (1933) pp. 130-136. 

3 EK. Slutzky, ‘“The summation of random causes as a source of cyclic processes.’’ Econo- 
metrica, Vol. 5 (1937) pp. 105-146. 

4H. E. Jones, ‘‘The theory of runs applied to time series,’’ Report of Third Annual Con- 
ference of Cowles Commission for Research in Economics (1937) pp. 33-36. This abstract 
itself does not include reference to repetitions, mentioned by Moulton and Wald. 

5 E. J. Moulton, ‘““The periodic function obtained by repeated accumulation of a statis- 
tical series. American Mathematical Monthly, Vol. 45 (1938), pp. 583-586. 

6 A. Wald, ‘“‘Long cycles as a result of repeated integration.’’ American Mathematical 
Monthly, Vol. 46 (1939), pp. 136-141. 

7 Karl Stumpff, Grundlagen und Methoden der Periodenforschung, Berlin, 1937, Julius 
Springer. 


8 Herman Wold, A Study in the Analysis of Stationary Time Series. Uppsala, 1938, 
Almqvist and Wiksells. 


254 






















ve 








LENGTH OF CYCLES FROM GRADUATION OF CHANCE ELEMENTS 255 








aulay’ gives a chart of 24 weight diagrams. Of these only the first four are 
without negative coefficients. 

Of course, the “waves” produced are irregular, and the difficulty of defining a 
cycle-length confronts us. The apparently naive definition of a cycle-length as 
the average distance between successive maxima (or minima) is, I believe, worth 
consideration as a rough first approximation of the cycle length for graduated 
values delivered by formulas with negative coefficients or by those involving at 
least triple summations. But the cycle length thus determined is somewhat too 
short; for, slight undulations will occur—-Slutzky calls them “‘ripples’’—which 
should be eliminated if we want a cycle-length intuitionally reasonable. On the 
other hand, the cycle-length defined as the average distance between alternate 
intersections of the graduated curve with the base line is likely to be decidedly 
too long,—as illustrated by Slutzky’s Figure 2 (loc. cit., p. 109) which exhibits 
1,000 graduated items, with 41 marked maxima and 41 marked minima—after 
elimination of what he considers ripples—but with only 23 up-crossings and 
23 down-crossings of the base line. I indicate in what follows an analytic 
method for removing ripples. And I describe several methods for obtaining a 
number which might be called a cycle-length. Often these seem to cluster about a 
central value, which appears to me to be a reasonable estimate of the “length 
of the cycle’’ created by the specified graduation formula. 

The theory to be presented here assumes that the chance elements are normally 
distributed about zero with constant variance. But the data used by Slutzky 
came from lottery drawings, with a “rectangular” distribution; and for checking 
I have likewise used rectangular distributions; mainly, three sets of 600 numbers 
each, taken from the tenth figures of logarithms in the Vega Tables. It is 
known, however, that the average of a few elements distributed rectangularly 
is nearly normal. From many tests that I have made, it would seem that 
rectangular distributions react as if normal. To illustrate: When normal data 
are given a twelve-fold summation or averaging by twos, the probabilities that 
at a specified point there will be an upcrossing of the base line, a maximum, or an 
inflection from concave to convex are respectively, 0.0628, 0.106, and 0.134. 
These numbers multiplied by 100 give 6.28, 10.6, and 13.4, as the expected 
number of occurrences per hundred graduated values. Slutzky exhibits in 
Figure 4 (loc. cit., p. 111) 100 ordinates as the result of applying to lottery 
drawings 12-fold summation by twos. The figure shows 6 or 7 up-crossings, 
ten maxima, and 13 or 14 such inflections—in close agreement with expectations 
based upon normal distributions. 























2. Derivation of Probabilities and Comparison of Actual with Expected 
Occurrences. A ‘“‘cycle length” is first conceived of as the reciprocal of a relative 
frequency or probability. 'Thus, if the probability that a graduated value will 


°F. R. Macaulay, The Smoothing of Time Series. 


Publications of the National Bureau of 
Economic Research, incorporated, No. 19 (1931). 


See pp. 77-79. 


256 EDWARD L. DODD 


be a maximum is 0.05, we expect 5 maxima per hundred graduated values, 
making the “‘cycle length” for maxima equal to 20. It will be recalled that if p 
is the probability of an occurrence of an event in a single trial, then in s trials 
the expected number of occurrences is sp, whether the trials are independent or not. 

It is assumed that the data, x; , x2, -- - are independent and normally distributed 
about zero with constant variance V. Then any linear function 


(1) Ur = A_mXr—m + os + Apr, + QL r+1 + sa + AmXr+m 


is likewise normally distributed about zero; and the variance of y, is V = Za}. 

(a) Probabilities When Two Conditions Are Imposed. Consider first the 
“planes” y,.1 = O and y, = 0, each in 2m + 1 dimensions; and jointly in 2m + 2 
dimensions. They form four “dihedral” angles. Let 


(2) 6 = angle between y,.1 = 0 and y, = 0, 


the inside points (2,-m-1, -+: ,2rym) being such that y,1 < 0, and y, > 0. 
Now, an orthogonal transformation or “rotation ’’leaves invariant this angle 
6 and also the normal probability function: 


(3) Probability = Const.-exp [— a7/2V]. 


The angle @ may be found” from 


m—1 m 


(4) cos § = = A; Qi41 7. a‘. 


oa 


Let us think of the rotation which carries the intersection of the planes into 
the “vertical” position. To find the probability that y,. < 0 and y, > 0, we 
integrate over all 2m + 2 dimensional space which lies between the two planes 
in the dihedral angle thus characterized. For 2m of such variables, the integra- 
tion extends from — ~ to + yielding unity asa factor. If wand v are the two 
variables that remain, then we are to find the volume of that portion of the solid 


(5) z = (1/2rV) exp [—(u’ + v°)/2V] 


which lies between two vertical planes through the origin making the angle @ 
with each other. Then, 


(6) Probability of up-crossing = 6/360°. 
(7) Cycle length for up-crossing = 360°/6. 
Let 


AYr = Yr — Yr- 


10 J), M. Y. Sommerville. An introduction to the Geometry of N Dimensions. Methuen 
and Co., Ltd., London, 1929. See p. 76. 








‘~~ 


e 


uen 







LENGTH OF CYCLES FROM GRADUATION OF CHANCE 





ELEMENTS 






Then y, is a maximum if Ay,.,; > 0 and Ay, < 0. Suppose 
(8) 6, = angle between Ay,., = 0 and Ay, = 0, 
inside points making Ay,.; > 9 and Ay, < 0. Then 






(9) 
(10) 


Probability for maximum at y, = 6,/360° 


Cycle length for maxima = 360°/86. 


The same equations apply to minima; since for minima we simply reverse 
the two foregoing inequalities, and pass to the equal “‘vertical’”’ dihedral angle. 

Likewise, from A*y,.. < 0 and A’y, > 0 we obtain an angle 62 such that 
6,/360° is the probability for change of inflection from concave downward to 
convex downward. This is also equal to the probability for change of inflection 
from convex to concave. Such changes of inflection have some interest on 
their own account and in checking; but do not seem to have any direct bearing 
upon the main problem under discussion here. 

(b) Probabilities When Three Conditions Are Imposed. 
elimination of ripples. 
required. 














We consider now the 
To make y, a maximum, two linear conditions are 
A third linear condition such as y, > 3(yr-% + Yrix), Or simply 
Yr > Yrik, With k > 1, will remove some ripples. Suppose we have given 
three planes through the origin, 





AX, + AeXe 


bx, a boae 





+ ries + nin = 0, 
+ oes + Dntn — 0, 
+ --- + Cat = 0. 


in pairs are 







(11) 








C1X1 + CoXe 





The angles between these planes 













(12) eco =bici cos B Sats COS Y Za;b; 
OSa = me -OS = -OS = . 
(Sb? -Dc7)!” (Sa? -Sc?)?’ (Sa? ->b3)! 


In general, eight-trihedral angles are thus formed at the origin; since we may 
take acute angles for a, 8, and y or their supplements. By an orthogonal 
transformation or “rotation about the origin’? we are led to the three dimen- 
sional problem of finding the portion of a sphere lying in a specified spherical 
pyramid with base a spherical triangle, ABC, having spherical excess EH = 
A+B4+C 180°. Now the spherical surface is 4 great circles or 720°. 
Hence, for a maximum, subject to an additional linear homogeneous inequality, 


(13) 












Probability of conditioned maximum = E/720° 









care having been taken to enter the proper trihedral angle. 

(c) Probabilities When Four Conditions Are Imposed. To avoid complexities 
involved in the use of four intersecting planes, the following expedient was em- 
ployed. Consider a set of values of y, such that this y, isa maximum. Among 
these there is theoretically a certain fraction or proportion p at which also 





258 EDWARD L. DODD 


Yr > Yrsk, With k > 7, and the same proportion p satisfying y, > y,-.. Let p’ 
be the proportion satisfying both inequalities. Then 1 — p’S1—p+1-—-—p 
leads to 


(14) p’ = 2p—1=p'— (1 — py. 
If p is fairly close to unity; a good approximation for p’ would seem to be 
(15) p' =p. 


This p’ would have been exact for p', had the graduated values been independent. 
That p’ is here only slightly above 2p — 1 seems likely, from the graduations 
that I have examined; for, the failure of one of the inequalities y, > ys. or 
Yr > Yr-~e Was seldom accompanied by the failure of the other. 

For graduations with the Spencer 21-term formula, when k = 5, p = 0.936, 
and (1 — p)° = 0.0041, which is fairly small. In practice, we would find in 
this case directly P = 0.07125 = probability of a maximum; Pp = 0.0668 = 
probability of a maximum at y, with y, > y,;5. Then the probability Pp’ of a 
maximum at y, with y, > ys and y, > y,-s would have as lower bound 
2Pp — P = 2(0.0668) — 0.07125 = 0.06235. 

But a closer approximation to the actual value would seem to be Pp* = 
(Pp)’/P = (0.0668)°/0.07125 = 0.0626. 

This would give a cycle length of 1/0.0626 = 15.97. 

(d) Indications from Correlation Theory. We may also attempt to estimate 
a cycle length with the aid of correlation theory. If for graduation, we use a 
linear operator with coefficients proportional to successive ordinates of a cosine 
curve with a specified period, it is, I presume, fairly well known that the gradu- 
ated values tend to exhibit the period of that cosine curve. Moreover, this 
quasi period may be induced very strongly with the use of formulas which 
represent ‘damped vibration” as shown by H. Labrouste” and Mrs. Labrouste. 
Now many standard graduation formulas have plots resembling somewhat 
damped vibration. Here, the central symmetrical arch leading down to the 
lowest negative terms on each side is usually large in comparison with the 
flanking waves. Now for a strict cosine curve of period 2k, the coefficient of 
correlation of y, and y,4x% 1s —1, at least theoretically. For chance material y, , 
with mean zero and constant variance, the coefficient of correlation between y, 
and y,,; is defined in terms of expected values, thus: 


(16) pi = E(yyr+i)/E(y?). 


For graduated values, y,, we might then seek the value j which will make p; 
as close to —1 as possible. But for most common graduation formulas, p; does 
not approximate —1. This difficulty, however, disappears if the graduation 


11H. and Mrs. Labrouste, ‘‘Harmonic analysis by means of linear combinations of ordi- 
nates,’’ Terrestrial Magnetism and Atmospheric Electricity, Vol. 41 (1936) pp. 15-28. See 
pp. 17, 18. 













LENGTH OF CYCLES FROM GRADUATION OF CHANCE ELEMENTS 259 






formula is properly centered. In a Fourier series, there is a constant term, 
which is of no significance in indicating oscillations, and is sometimes eliminated. 
The analogous modification for a linear graduation formula with n coefficients— 
of which the sum is unity—would seem to be the subtraction of 1/n from each 
coefficient, forming what I regard as a residual. For this residual, negative 
correlations of substantial size appear. And that j with which the numerically 
largest negative correlation p; is associated may be considered as indicating a 
half-cycle length. 

In the case of the Spencer 21-term formula, 7 = 8, making cycle-length = 16, 
just about identical with the cycle length for maxima at y, with y, > y,—5 and 
Yr P Boss 

(e) The period of a Closely Fitting Cosine Curve. By another route, also, we 
may approach the problem of associating with a specified linear graduation a 
number as the length of induced cycles. We shall consider here only those 
formulas in which the coefficients are symmetrical with respect to the center. 
In equation (1), this means that a_; = a;;j7 = 1,2, ---,m. Suppose now that 
the x’s are no longer chance elements, but are the successive terms of a cosine 
curve with period k. That is: 




























(17) zr = cos (r6 + a); 6 = 2nr/k = 360°/k. 
Then, if a_; = a;, it follows that 

(18) Q_ jt; + 24; = 2a; cos j0-cos (r6 + a). 
Then, from (1), 

(19) yr = C cos (r6 + a), 


where C is independent of r. For a given graduation formula, with a’s speci- 
fied, this C depends upon 6, or we may say, upon k = 360°/6. We may regard 
the graduation formula yo as “‘fitting best” the curve cos [r(360°)/k] when k is so 
chosen as to give to C a largest value. The presumption is that the graduation 
formula will curl chance data up into cycles in about the same fashion as a 
cosine curve to which it is closely akin. The actual period of this closely 
fitting cosine curve may then be taken as the quasi-period or “‘cycle-length”’ 
of the graduation formula. 

If, relying upon intuition, we were to select a cosine curve to fit a given 
graduation formula, we might easily decide to disregard the small waves that 
usually flank the central arch, and to take a cosine curve with a span—distance 
between minima—equal to the span of this central arch. In fact, this span 
gives, I believe, a good first estimate of the cycle length of the induced waves. This 
first estimate seems, however, a trifle too small. 






3. Size of Ripples, Simple Summation, Variability, and Height of Waves. 
(a) Size of Ripples. In the use of y, > y,.% to remove ripples, what integer 
should we take for k? The dividing line between ripples and waves is of course 











260 EDWARD L. DODD 






arbitrary. As Figure 2, p. 109, Slutzky exhibits 1,000 graduated values from 
two-fold summations by 10, with ripples removed. He states (p. 119): ‘maxima 
and minima with amplitudes of ten units or less being discarded as ripples.” 
For this double summation, I find that the probability that y, will be a maximum 
with y, > Yry10 and y, > Yr—10 is approximately 0.0437. Among 1,000 graduated 
values, 43.7 such maxima would then be expected. Slutzky marks with arrows 
the 41 maxima which remain after the elimination of what he regards as ripples. 
The reciprocal of 0.0437 gives 22.9 as cycle length. Then k = 10 is less than 
half this cycle-length. For standard graduation formulas, it would seem likely 
that a value of k about one-third the span of its central arch would eliminate fairly 
well the inconsequential fluctuations; and likewise for graduations, with coeff- 
cients forming an arch with nearly horizontal ends, like twelve-fold summation 
by twos, with arch span 12. For this twelve-fold summation, I find that 
0.0831 is the probability that a maximum will occur at y,, with y, > yrs4 
and y, > Yr4, giving 8.31 such maxima per hundred graduated values. 
Slutzky’s Figure IVa shows eight such maxima, and two ripples. 

(b) Simple Summation. I shall not discuss in detail the cycles produced by 
simple summation or averaging. Formulas for probability here are relatively 
simple. Thus, for the sum or uverage of m normal chance data, the probability 
of a maximum is 1/4, irrespective of the value of n. This appears to be about 
valid for rectangular data if we count the weak maxima. A simple average of 
chance data, however, seems to inherit largely the chaotic character of the present 
data. But some sinuosity is, after all, implanted. 

(ec) Variability. A general discussion of the variability of induced waves is 
beyond the scope of this paper. However, I record a numerical result. For 
the Spencer 21-term graduation formula, the probability of a maximum is 
0.07125. Among 580 graduated values, then, 41.3 maxima would be expected. 
Actually, 42 maxima were found. Now, if n — 1 points are placed “at random” 
on a line of unit length—here dz is the probability that a point will fall in an 
interval of length dx—then the expected value” of the sum of the squares of 
the resulting n segments is 2/(n + 1). Thus, if 42 points are placed at random 
on an interval of 580 units, the expected sum of the squares of the seg- 
ments is (2/44) (580) = 15,290.9. But, if the points are placed at equal 
intervals, this sum of squares takes its least value, (580)’/43 = 7,823.3. Then, 
15,290.9 — 7,823.3 = 7,467.6. On the other hand, the 42 maxima among 
Spencer graduated values gave segments for which the sum of the squares was 
8,656.5; that is, only 833.2 in excess of the above 7,823.3, which represents perfect 
periodicity for maxima. Of course, this excess of 833.2 indicates considerable 
departure from perfect periodicity; but it is nowhere near the 7,467.6 to be 
expected from a random distribution of points. In spite of irregularities, the 
sinusoidal character of graduated values is conspicuous. 

(d) The Height or Amplitude of Induced Waves. While our chief interest 






12 W. Burnside, Theory of Probability, Cambridge University Press, 1928. 





See p. 71. 


















LENGTH OF CYCLES FROM GRADUATION OF CHANCE ELEMENTS 261 


here lies in what is called the length of a cycle, a brief reference may well be 
made to the amplitude or height of the induced waves. The operation of the 
linear function y, in (1) upon data with variance V yields graduated values with 
variance VEa;. This particular statement does not require the assumption 
of normality. Thus the Spencer 21-term formula is expected to produce gradu- 
ated values with a standard deviation 37.8% of that of the data. This repre- 
sents some reduction, of course; but, nevertheless, the “waves” stand out in bold 
relief. They are not diminutive. 





4. Data and Graduations Examined. Slutzky’s graduations, exhibited in 
Econometrica, Vol. 5, have already been mentioned. Three sets of chance data 
were graduated by students at the University of Texas, Mr. Victor W. Pfeiffer 
in 1936, Mr. C. M. Tolar and Miss Anna Velma Martin, in 1938, to make tests 
with regard to smoothing coefficients,” the results appearing in M.A. theses. 
The data were figures in the tenth place of the Vega logarithm tables, 600 num- 
bers in each set, as follows: Logarithms from 200 to 799; logarithms of cosines 
of angles from 0° to 59° 54’, by intervals of 6’; logarithms of sines of angles 
from 6’ to 60° by intervals of 6’. The graduation formulas used were all sym- 
metric, with a_; = a;. Mr. Pfeiffer used the Spencer 21-term formula, with 
coefficients 1/350 of: 


~h, —3, 





—5, —5, —2, 6, 18, 33, 47, 57, 60, 57, ete. 


The other two formulas used were 11-term formulas which I devised, correct 
to third differences, and with fourth differences rather small, described by: 
—1.13 D* and —0.97 D*, where D = log, E (see Henderson, loc. cit., pp. 26-37); 
as compared with —5.4 D* for Woolhouse 15-term, and —12.6 D* for Spencer 
21-term. These two 11-term formulas are: 

(i) Averaging by twos, threes, and fours, applied to (1/12) (—4, 3, 14, 3, —4) 
yielding (1/288) (—4, —9, 3, 36, 73, 90, 73, 36, 3, —9, —4); 

(ii) Triple averaging by threes, applied to (1/10) (—3, 2, 12, 2, —3) yielding 
(1/270) (—3, —7, 0, 29, 71, 90, 71, 29,0, —7, —3). From part of the foregoing 
data, also, I made other graduations to check certain probabilities. 


5. Cycle Lengths for the Spencer 21-Term Graduation Formula. All the 
various determinations of cycle length mentioned in the foregoing were applied 
to the Spencer formula, and to some other formulas. The results obtained for 
the Spencer formula seem representative, and will be given here in detail. Our 
main conception of a cycle-length is that it is the reciprocal of a probability or 
relative frequency. The probability of a minimum is the same as that of a 
maximum; of a down-crossing of the base line, the same as that of an up-crossing. 
Probabilities are listed that the representative ordinate y, will be a maximum— 


13 Robert Henderson, Graduation of Mortality and Other Tables, Actuarial Society of 
America, New York, 1919, p. 34. 























so = 





































262 EDWARD L. DODD 


with or without further restrictions. The probability is given for an up-cross 
at the representative abscissa z,. In the table which follows, a middle entry 
for a cycle length of 16 is obtained from the “residual”? described in (d) of 
Section 2. 


The Expected Length of Cycles Produced When Normal Chance Data Are Graduated by the 
Spencer 21-term Formula in Accordance with Various Specifications for the Cycle 


Specification Probability | Cycle-Length 


Maximum at y;............ | 0.07125 14.0 
Maximum at y; with y; > Yris 0.0668 15.0 
Maximum at y, with yr > (yar - Yraz).... 02. eee eee : 0.0657 15.2 
Maximum at y; with yr > Yrys, and Yr > Yrog.........--- 0.0626 

By use of ‘“‘residual’’. (See 2(d)) 


Maxiniuni Ab vp With tp > Bere: co 650 acces ctec se aces 0.0623 





Period of ‘‘best fitting’? cosine curve. (See 2(e))........ 
Maximum at y,, yr > 0. (Or: yr > Mean y;,)... 0.0591 


Maximum at x, with y- > tesz and Yr > Yraz.....cc.. coe ssl 0.0545 





WR IGLORS BG Dee ccc bs os dene eses. 0.0469 


The foregoing exhibit seems to suggest a cycle length of something like 16 for the 
cycles created by the operation of the Spencer 21-term formula upon chance data. 
This is just about the reciprocal of the probability that a maximum will occur 
at y, with y, > yr45 and y, > y,». If 16 is thus set up as the standard wave 
length, a wave of 10 units extending from z,_; to z,,5 would not be regarded as 
insignificant. 

Now 16 is also the interval between the outermost low coefficients, —5, in 
the Spencer formula. The plot of a curve through ordinates equal to the 
Spencer coefficients would probably make the central arch have a span of 
about 15. This 15 seems a little too small as a representative of cycle lengths 
obtained by the foregoing different methods. 

From the theory set forth, 0.0626 is the probability that a maximum will 
occur at y, with y, > y-45 and y, > y,-s. Then among 580 graduated values, 
36.3 such maxima would be expected. Among the Pfeiffer graduated values 38 
were actually found. 


6. Comparative Results of Seven Graduation Formulas. An exhibit will now 
be made of results obtained from seven graduation formulas. Of these, the 
simplest is double averaging or summation by tens, with coefficients forming a 










the 


in 
che 
of 
ths 


vill 
les, 
38 


LOW 
the 


LENGTH OF CYCLES FROM GRADUATION OF CHANCE ELEMENTS 263 

















triangular arch, with a “span” which will be set down here as 18. Next in 
order of simplicity—avoiding negative coefficients—is 12-fold averaging by twos. 
Probabilities are given that a maximum will occur at a point y,, with y, > y,-x 
and y; > Yy,r+« for what seems to be appropriate values of k. In the five cases 
where graduations were made, the number of the maxima of specified character 
actually found are set down in line with their expected values. Also the span 
of the central arch is compared with cycle lengths. 

Macaulay (loc. cit., pp. 73, 74) mentions favorably a 43-term formula obtained 
as follows: Summation by 8, by 12, doubly by 5, applied to weights: +7, —10, 
0, 0, 0, 0, 0, 0, +10, 0, 0, 0, 0, 0, 0, —10, +7. This is the longest formula to 
be considered here. 

As noted before, my theory is based upon the assumption of a normal distri- 
bution for the data. The data actually tested had a rectangular distribution. 
Nevertheless, close agreement was found between the expected number of 
maxima and the number actually found. 











Results of Applying Seven Graduation Formulas to Chance Data. Comparison of the Expected 
Number of Conditioned Maxima with the Actual Number Found Among Graduated 
Values, and Comparison of Cycle Length with Span of Central Arch 





(1) (2) (3) | (4) | (5) (6) (7) (8) ’ 
Probability; Number Expected, Actual Span of j 
Gendestion Fermule k Max. at yp of Grad- | Number Number! Cycle | 7 


Ur > Yr-k | uated | of Such! of Such | as 1/(3) = 
Yr > Yr+k |Items, yr} Maxima Maxima ‘ 


ll-term by Tolar................ | 3 | 0.110 | 590) 64.9 67 | 9.09] 8 










1l-term by Martin...... eines 3 0.114 590 | 67.3 65 8.77 8 






13-term (2)!2 by Slutzky......... 4 0.0831 100 8.31 8 12.0 12 






19-term (10)? by Slutzky......... 10 0.0437 | 1,000 | 43.7 41 22.9 18 






21-term Spencer by Pfeiffer...... 5 0.0668 580 | 36.3 38 16.0 15 






29-term Kenchington............ 8 | 0.0428 | | 23.4 | 20 


43-term Macaulay............... ¢ 0.0389 | 25.7 22 










7. Summary. E. Slutzky found that the summing of chance data resulted 
in series of numbers with something like a cyclic appearance,—this being intensi- 
fied by repetition of the summing. Slutzky and others have proven limit 
theorems. In this paper, I study the effects of a single application of a gradua- 
tion process upon chance data. The most acceptable graduation formulas con- 
tain negative coefficients, and thus involve something more than repeated 
summations. Several methods are discussed for assigning to a given graduation 
formula a number as the length of the cycles it tends to produce. One of the 
most satisfactory of these is in line with the suggestion of Slutzky that before 
counting maxima, any insignificant “ripples” should be eliminated. The proba- 








264 EDWARD L. DODD 


bility is found that a graduated value y, should be a maximum—greater than 
the two adjacent values y,1 and y,,;—with the further condition that for some 
appropriate k, y, shall be greater than y,_, or y,;, or both. The reciprocal of 
this latter probability is suggested as the length of the cycle which the given 
graduation tends to implant in the graduated values. 


THE UNIVERSITY OF TEXAS. 












ON THE DISTRIBUTION OF THE “STUDENT” RATIO FOR SMALL 
SAMPLES FROM CERTAIN NON-NORMAL POPULATIONS’ 


By H. L. Rierz 





Much of interest in the theory and practice of statistical methods has been 
developed around the distribution function, 


(1) I'(N/2) 


(Xe ‘) (1 2)¥ 






























‘ z a 
of the “Student” ratio, z = — , where % denotes the mean, s the standard 


aE hme Nee 





deviation of a sample of N items, say 7, %2, --- , %y, taken at random from a 
normally distributed parent population of mean, m. 

The investigations of certain non-normal parent distributions by Shewhart 
and Winters [1], Rider [2], E. S. Pearson [3], M.S. Bartlett [4], and R. C. Geary 
[5] indicate that applications of the “Student” theory give more satisfactory 
results than the classical theory for a considerable variety of non-normal parent 
distributions, but some of these investigators find that the theory fails in certain 
cases to describe the facts to an extent that suggests further experimental 
sampling investigations along this line whenever suitable data are available. 
Others infer that a completely satisfactory analysis of the position of the “‘Stu- 
dent”’ z-test will be possible only if the theoretical distribution of z in samples 
from the non-normal distribution in question becomes known. Several of the 
above named statisticians have attributed the failures of the distribution (1) 
to describe their data, in large part, to the correlation between x =  — mand s. 
For this reason, there is considerable interest in the degree of correlation between 
zt = = — mands, and especially in the nature of the regression of s or of s’ on z. 

The present paper gives an analysis of data obtained by experimental sampling 
from two non-normal distributions whose sources we shall now describe. The 
parent distributions with which the paper is concerned are theoretical distribu- 
tions resulting from certain urn schemata devised [6] by the writer some years ago. 

In 1925, Leone E. Chesire, in an unpublished thesis for the degree, Master of 
Science, at the University of Iowa, obtained data by experimental sampling, 
that seem to be appropriate material for a study of the correlation of mean and 
standard deviation for small samples from certain non-normal distributions. 

One of the original bivariate parent populations, whose marginal totals we are 





1 Presented in part before the American Mathematical Society under a somewhat 
different title, November 26, 1937. 


265 


266 H. L. RIETZ 


using, exhibited linear regression while the other exhibited non-linear regression. 
For convenience in distinguishing between the two cases, we shall speak of 
material from the linear case as Case I and that from the non-linear case as 
Case II. After devising a scheme for drawing pairs of variates at random, 
5,000 pairs were drawn in sets of five for each of the two cases. 

While the primary purpose of this experimental sampling was to study the 
distributions of means, standard deviations, and correlation coefficients [7] for 
small samples from the non-normal populations, we have as a by-product, in the 
marginal totals of the correlation tables, four sets of 1,000 pairs of means and 
standard deviations. However, since three of the four sets of marginal totals 
of the two theoretical parent correlations tables are alike, we have actually 
only two significantly different sets to consider. 

Case I. For the case of linear regression of y on x in the bivariate parent 
population, the parent distribution from the marginal totals may be very simply 
described by showing the frequency distribution in Table 1. 


TABLE 1 


| 


Sums in second throw of dice-values | 
of stochastic variable..............., 2 | 3| 4 | 6 | 8| 9/10} 11 | i2 


« —|— - 
Frequency Bish 6 | 12) 18 | 24 | 30 | 36 | 30 | 24 | 18 | 12) 6 


The moment coefficients and 6’s which characterize the distribution given in 
Table 1 are: 


Mean = 7, pe = 58, us = 0, ws = 80.5, Bi = 0, Bo = 2354s. 


Each of the 1000 sets of five drawn from the distribution in Table 1, yields a 
mean ¥ and a standard deviation, s, , which we shall denote by w to make our 
notation simpler to write. Table 2 is the correlation table of the pairs (9, w). 
The correlation coefficient r.;, between mean g and standard deviation s, = w 
has a value 


Tog = —0.020 + 0.021 


which differs insignificantly from zero. 
The uncorrected value of the correlation ratio of w on 7 is 


Qwy = 0.182. 


When we remember that the correlation ratio is not free to vary in the negative 
direction from 0, and apply the Pearson correction [8] for this situation together 
with the “Student” correction [9] for grouping, we obtain for the corrected, 
Nw; , the value 0.133. 

It becomes fairly obvious that significant correlation exists and that the 
regression is non-linear. Indeed, it has been shown recently by Geary [5, 
pp. 178-9] that normality in the parent distribution is both a necessary and 





ON THE DISTRIBUTION OF THE “STUDENT” RATIO 267 


TABLE 2 


Correlation of mean ¥, and standard deviation s, = w, of samples of five items for Case I. 
Mean of 7’s = ¥ = 7.141. Correlation coefficient rwy = —0.020 + 0.021, 
Sy = w = 2.079. Correlation ratio * Ww On ¥, nwy = 0.182 (uncorrected). 


| jar 4.1 }45 1/49/53 57 6.1 — sha = as|es|93|97 101 | 





at te 





——— 























I 3 
212 
































SET 


sufficient condition for the independence of the mean and standard deviation 
in samples. 

Since the number of correlated items, N = 1000, is fairly large, we examine 
into the significance of y.; = 0.182 under the assumption that Nn. is approxi- 
mately distributed [10] as x” with a — 1 = 16 degrees of freedom. This criterion 
gives odds in favor of significant correlation on approximately a 100 to 1 level of 
probability. 

Next, the means of arrays, @,, were plotted to scale on Table 2 to give a 
general notion of the nature - the regression of w = s,on g. The location of 
these means of arrays of w’s affords at least a suggestion of parabolic regression 
[11] with the curvature concave downward as is to be expected when 62 — f; — 
3 < 0, where the #’s relate to the parent distribution. 





268 H. L. RIETZ 


The next step taken was to analyze the variance, as indicated partly in Table 3, 
where w; (t = 1, 2, --- , N) denotes the stochastic variates, a the number of 
arrays of w’s, ® the mean of the N values of w; , n,»(p = 1, 2, --- , a) the number 
of variates in an array marked p, #, the mean of the array marked p, and where 
the class interval in Table 2, is taken as the unit. 


TABLE 3 
Sum of squares 


7 e e a 

For deviations of means of arrays . one 

, N»(Wp — w)? 
RMU Aa steers seere rs etenseeiennic 


= 380 


p=1 
For deviations of variates from 


j = 11,098 
the means of their arrays 


Total 11,478 


In the exhibit given in Table 3, we use the usual algebraic identity 


N 


on) 2 _ —\2 — \2 
(2) LX, (w; — @)? = 2) n,(@, — B) + LD (wi — @,)’, 
t=] p=1 
where the double sum is made up of a sum of N squares. 
By dividing the members of (2) by N, we have 
N C 
1 7 aie 1 te a 1 am of 
(3) — 2 (wi — w) = — Di n,(B, — BY” +— DD (w; - G,) 
N i= N p= N 
The writer has used the identity (3) for many years in lectures to beginners in 
statistics in proving the equivalence of two definitions of the correlation [12] 
ratio and is strongly of the opinion that the equality in form (3) appeals more 
readily to the intuitions of many readers, because of their acquaintance with 
statements in the language of averages, than does the equivalent equality (2) 
in the language of sums of squares. 
In an extended and more compact form, the analysis is shown in the standard 
form in Table 4. 


TABLE 4 


Degrees of Sum of Mean 


Variance freedom Squares | square pam 


Between arrays.... 16 380 | 23.75 $ log. 23.75 = 1.584 
4] 


Within arrays..... s 983 11,098 | 11.29 oge 11.29 = 1.212 


999 | 11,478 | Difference = 0.372 


When the sum of squares equal to 380 associated with variance between arrays 
is further analyzed into a part which could be represented by linear regression, 








e3, 
of 
ber 


ere 


rs in 
_ [12] 
more 
with 
y (2) 


dard 


irrays 
sion, 





ON THE DISTRIBUTION OF THE “STUDENT” RATIO 269 
and a part which represents deviations of the calculated means of arrays of w’s 
from a straight regression line of w on g, the deviations being measured parallel 
to the w-axis, we find that the part of the amount 380 represented by linear 
regression is given by 


NrngS> = 1000 (.00040)(11.487) = 4.3. 


Since both r = .020 + 0.021 and the small value, 4.3, as part of the sum of 
squares amounting to 380, may well be regarded as sampling fluctuations, we 
revert to the figures in Table 3 and apply the Fisher z-test. It turns out that the 
correlation is significant on practically the 100 to 1 level of probability which 
conforms well with the above inference based on the assumption that N72; is 
distributed as x”, with a — 1 degrees of freedom. 

Next, we computed 1000 values of the “Student” ratio z = (g — 7)/w, for 


. : 0 
CaseI. One of these 1000 values was of the indeterminate form 0° A frequency 


distribution of the 999 determinate ratios is shown in column (3), Table 5. 

By grouping together the class frequencies at the tails of the theoretical dis- 
tributions until each of the end class frequencies is not less than 5, and calculating 
x for the observed distribution in column (3) in comparison with the theoretical 
distribution in column (6) as found from the “Student” theory in samplesof 5 
items from a normal distribution, we obtain x” = 3.728 with 11 degrees of 
freedom. 

Thus, the differences between the distribution in column (3) and the “‘Student”’ 
distribution for N = 5 shown in column (6) are not only insignificant under the 
x'-test, but are so small as to be expected in a relatively small percentage of 
statistical experiments even if the “Student” z-distribution were the theoreti- 
cally exact distribution of our ratios. 

The usual moment coefficients of the distribution of observed z’s in column (3), 
Table 5, are: 


0.033533, ws = 0.254383,  —, = 0.55955, 
8 = Vue = 0.69799, uy, = 2.22504, Be = 9.37353. 


Since the value, 0.69799, of the standard deviation of the observed distribution 
differs very little from 1/+/N — 3 = 0.70711, the normal curve fitted by using 
the standard deviation of the observed distribution (column 4, Table 5) differs 
very little from the normal curve with the origin at the population mean and 
standard deviation, »/2/2, (column 5). Furthermore, the application of the 
x-test to columns (4) and (5) of Table 5 with class frequencies in the “tails” 
grouped as above gives x” = 2.91 with 9 degrees of freedom. 

The moment coefficients of the observed distribution indicate a markedly 
leptokurtic and somewhat skew distribution but the indications of skewness 
may be traced mainly and perhaps entirely to the presence of the two extreme 
variates at the upper end of the distribution and separated about three times the 
standard deviation from the next class frequency that differs from zero. By 





















= 



































Distribution of the ratios, z 


L. 
as 
2. 
2. 
3. 
3. 
4. 
4. 


oo or cr 


excluding these two variates from our calculations, 





moment coefficients: 


s = 


M1 


V us 


1 = 0.023571, 


= 0.662202, 


H. 


= (9 — 7)/w in samples of N = 5 for Case I. 


(3) 


Observed 
distribution 


owe ot 


no © 


999 


L. RIETZ 


TABLE 5 


(4) 


Normal distri- 


observed 
column (3) 


998.8 


us = 0.022264, 
us = 1.009673. 


| bution fitted to | 


Normal distri- 
| bution of 8.D. 


Bi 
Be 


VN-3 V2) 
In same units as | 

| 2 (measured from 

| population mean) 


(5) 


1 |From the Student 
theoretical 
distribution for 
N=5 


“Inpnrocccc eo 
ON WOWN KE Ee 


ccooocooorns: 
| om me DWWawneo 





co 
co 
© 


999.0 


we obtain the following 


0.0058786, 
5.2507062. 


In the observed distribution thus modified, by excluding the extreme upper 
class frequency 2, the evidence of skewness has disappeared. 


Case II. 
Table 6. 


Totals in second throws of two dice- | 
values of the stochastic variable... 


NI. os sinc opal csi baers ores veniterayra 


TABLE 6 


2 3 | 5 | 


| 16 | 25 | 36 | 35 | 32 | 27 


For our Case II we have a frequency distribution as shown in 


8 | 11 | 12 
slide 


| 20 | 1 








lent 


or 


jpper 


— 
t 










“e 


ON THE DISTRIBUTION OF THE “‘STUDENT’’ RATIO 271 








Again, since with the uncorrected »,;, Table 6, we have Nn, = 31.5, and 
since Nn}; is approximately distributed as x* with a — 1 = 17 degree of freedom, 
we have odds of the order of 100 to 1 against so large a value being a mere 
sampling fluctuation. 


TABLE 7 

Correlation of mean ui, and standard deviation su = v, of five items for Case II, mean of 
ui u = 6.971. Correlation coefficient roi = —0.012 + 0.020. 

v = Su = 2.044. Correlation ratio of v on %, nua = 0.177 (uncorrected). 


- 
u 







































































































































—_——e a — os 
37| ai[4s 49|53 |57]61 |6s 69\73 77/8. s|a.9|93197/\10. iosiio9| fl | 
A Pes | eulieions +— me — Sa 
39 } | ae | 3 
pf pt —t —t— 
37 [ fr} | | 3|1 | | 5 
exten ——-—-+——- —j}—___4}____+ t mi 
35| | | Jijelrlale 
L Icndihieeediccsiiheealcciineedcoetaastie 
33| | l Tilsials| 
a or +} 
3 | 16} 9| 
ss aici 7] T + . 
fee | + — 
ep 
|| 2] 10 10| 17| 8 
ratsisiululal 
‘ 
| 3 
ne 
| 
| 2 
= csi 
rt 
Ta 
+ 4 4 4 
| | 
1 +— + t 
ee 
= 















































Now proceeding to the analysis of variance, we substitute our numerical values 
derived from Table 7 in the identity 





() a ee ee FS 2) ee 


i=1 p=1 


and obtain, in terms of class intervals as units, 
10,871 = 340 + 10,531. 


An outline of the analysis is exhibited in Table 8 


H. L. RIETZ 


TABLE 8 


Degrees of Sum of Mean 


Variance -tes 
freedom squares | square z-test 


Between arrays 17 340 20.00 log. 20.00 = 
Within arrays......... 982 10,531 | 10.72 | log. 10.72 = 


MURINE oo cehiud he nore bees 999 10,871 | Diff, = 


| 


The moment coefficients and B’s which characterize the distribution in Table 6 
are: 


Mean = 7.972, be = 4.888, bs = —1.755, ba = 58.724, 


By = 0.0264, Be 2.449. 


As in the linear case, samples of 5000 pairs of variates were drawn in sets of 
five by Miss Chesire. Analogous to Case I, our first concern is with the regres- 
sion of the standard deviation, s, = v, of u from a sample of five on its mean 
value, w. 

The correlation table for values of &@ and v is shown in Table 7. The correla- 
tion coefficient is 
Mya = —0.012 = +0.021, but the uncorrected correlation ratio of v on @ is given 
by 


Mi = 0.177. 
After applying the Pearson and Student corrections, we obtain the corrected 
Ma = 6.131. 


When the sum of squares, 340, associated with variance between arrays is 
further analyzed into a part which could be represented by linear regression, 
and a part which represents deviations of the calculated means of arrays of 
v’s from a straight regression line of v on a, the deviations being measured parallel 
to the v-axis, we find that the part of the amount 340 represented by linear 
regression, would be only Nr’s; = 1000 (.000144)(10.871) = 1.6. 

Since both 7,z; = —0.012 + 0.021 and the small value, 1.6, as part of the sum 
of squares 340, may well be regarded as sampling fluctuations, we revert to the 
figures of Table 8. : 

The difference of the logarithms in the last column of Table 8, is 0.32, which 
corresponds to a level of significance of the general order of 100 to 1. Next, 
we calculate and plot on Table 7 the means of arrays of v’s to give a general 
notion of the regression of von %. The location of these means of arrays suggests 
rather strongly that the regression of v on @ is parabolic with the curvature 
concave downward as we should expect from the fact that B2 — B, — 3 < 0, 
where the #’s pertain to the parent distribution. 

Next, we computed 1000 values of the “Student” ratio, 2 = (u — 7.972)/2, 





eted 


ys 18 
310n, 
's of 
‘allel 
near 


sum 
» the 


vhich 
Next, 
neral 
gests 
ature 

< 0, 


(2)/2, 


ON THE DISTRIBUTION OF THE “‘STUDENT”’ RATIO 273 


for Case II. One of these ratios was infinite. A frequency distribution of the 
999 determinate ratios is shown in column 3, Table 9. 

The observed distribution (column 3) and the “Student” distribution (column 
6) of Table 9, to be expected in samples of N = 5, when samples are drawn from 
a normal distribution, are in close agreement. In fact, when we group together 
the tail frequencies of the theoretical distribution until each of them is not less 
than 5, the result of testing the goodness of fit gives x” = 17.187 with 11 degrees 
of freedom. This gives a value in the neighborhood of 0.1 for the probability, 
P, that as large or larger deviations than that experienced will occur, due to 
chance fluctuations, in a single repetition of the experiment. In other words, 
on the basis of this test, the indications are that we should have in the long run, 
as large or larger deviations than we have experienced in this case, in about 
10 per cent of a large number of sets of sampling of 1000 per set even when the 
sampling is from a normal distribution. 


TABLE 9 
Distribution of the ratio, (&j — 7.972)/v in samples of five for Case II. 
(3) | (4) (5) | (6) 


Normal distri- : : ; 
Normal distri- bution with | Student's z-dis- 
Sa | ae. : cman 
observed, rear /N -3 : 


< Vv. population 
Column (3). and origin at with N =5 


population mean | 


Observed 


am 

_ 

—9 2 
3 


SNWO 


ee 


-_~ 
nm 


wnnee 
bh hoes a 
ONMHAEWHNHK CHNWHhAA AI 
ww ¢ 


mm OO 
eo 
— OD 


999 999. 999. 999. 
1 








H. L. RIETZ 


SUMMARY 
1. The linear correlation coefficient, 7, of the mean and standard deviation 
differs insignificantly from 0 in each case. 
2. The correlation ratio of the standard deviation on the mean differs sig- 
nificantly from 0, and the regression of the standard deviation on the mean 
conforms, in its general aspects, to expectation under the theory of Neyman [12]. 


a : , : ; Bra 
3. The indeterminate “Student” ratio of the form, 0° in Case I and that of the 


form, (constant)/0, in Case II are probably due in part to grouping into class 
intervals, but the infinite ratio would undoubtedly have had such a large value 
that it would be excluded from calculations under any one of the known criteria 
for rejection of extreme observations. 

4. Although the rejection of one indeterminate ratio in each of the two cases is 
slightly disturbing, the evidence presented by our analysis of the experimental 
sampling lends support to the view that the results of the “‘Student”’ theory are 
almost certainly applicable, for many purposes, when the parent distributions 
are of such non-normal types as are involved in our sampling. 


REFERENCES 


7. A. Shewhart and F. W. Winters, ‘‘Small samples—new experimental results,” 
Journal American Statistical Association, Vol. 23 (1928), pp. 144-153. 

. R. Rider, ‘‘On the distribution of the ratio of mean to standard deviation in small 
samples from non-normal universes,’ Biometrika, Vol. 21 (1929), pp. 124-148. 
“On small samples from certain non-normal universes,’’ Annals of Mathematical 
Statistics, Vol. 2, (1931), pp. 48-65. 

. Pearson, ‘‘The distribution of frequency constants in small samples from non- 
normal symmetrical and skew populations, Biometrika, Vol. 21 (1929), pp. 
259-286. 

. Bartlett, ‘“The effect of non-normality on the ¢ distribution,’’. Proceedings of 
the Cambridge Philosophical Society, Vol. 31 (1935), pp. 223-231. 

>. Geary, ‘“‘The distribution of ‘‘Student’s”’ ratio for non-normal samples,’’ Supple- 
ment to the Journal of the Royal Statistical Society, No. 2, 1936, pp. 178-184. 

. Rietz, ‘‘Urn schemeta as a basis for the development of correlation theory,” 
Annals of Mathematics, Vol. 21 (1920), pp. 306-322. 

$. Pearson, Leona Chesire, and Elena M. Oldis,’’ Further experiments on the 
sampling distribution of the correlation coefficient,’’ Journal American Statisti- 
cal Association, Vol. 27 (1932), pp. 121-128. 

Karl Pearson, ‘‘On a correction to be made in the correlation ratio,’’ Biometrika, 
Vol. 8, (1911-12), pp. 254-6. 

“Student,’”’ “The correction to be made in the correlation ratio for grouping,’’ Bio- 
metrika, (1913), pp. 316-20. 

R. A. Fisher, Statistical methods for research workers, Fourth Edition, p. 237. 

J. Neyman, “On the correlation of mean and variance in samples drawn from an 
‘infinite’ population,’’ Biometrika, Vol. 18 (1926), pp. 401-13. 

H. L. Rietz, Mathematical Statistics (Carus Monograph), 1926, p. 91. 


UNIVERSITY OF Iowa, Iowa City, Iowa. 


















he 
ASS 
lue 
ria 
Sis 
ital 


are 
Ons 


ts,” 
mall 


143. 
tical 


| the 
tisti- 


rika, 


Bio- 


m an 


THE PROBLEM OF m RANKINGS 
By M. G. KENDALL AND B. BABINGTON SMITH 


1. Introduction. If nm objects are ranked by m persons according to some 
quality of the objects there arises the problem: does the set of m rankings of n 
show any evidence of community of judgment among the m individuals? For 
example, if a number of pieces of poetry are ranked by students in order of 
preference, do the rankings support the supposition that the students have 
poetical tastes in common, and if so is there any strong degree of unanimity or 
only a faint degree? 

The problem in its full generality permits of no assumption about the nature 
of the quality according to which the objects are ranked, other than that ranking 
is possible. No hypothesis is made that the quality is measurable, still less 
that there is some underlying frequency distribution to the quantiles of which 
the rankings correspond. The quality is to be thought of as linear in the sense 
that any two objects possessing it are either coincident or may be put in the 
relation ‘‘before and after.’’ A metric may, of course, be imposed on this linear 
space by convention; but the relationship between objects is invariant under 
any transformation which stretches the scale of measurement. In particular, 
it is not a condition of the problem that the ranking shall be based on a distri- 
bution according to a normal variate. 

It is permissible to denote the rankings by the ordinal numbers 1, 2, --- n; 
but it is not permissible, without further discussion, to operate on these num- 
bers as if they were cardinals. This point seems to have been inadequately 
appreciated. For instance, when m = 2 we have the familiar case of rank 
correlation between a pair of rankings; and this is frequently treated by sub- 
tracting corresponding ranks, squaring, and forming the Spearman coefficient 


(1) ge i~ = 


ni — n 






















To justify this procedure it is necessary to explain what is meant, for example, 
by such a process as (4th minus 8th), or what the square of this difference of 
ordinal numbers represents. 

It is worth stressing that the necessary transition from ordinals to cardinals 
can be made without invoking a scale of measurement. When we rank an 
object as first we mean, in effect, that no member of the set of n is preferred 
to it; when we rank it as the rth we mean that (r — 1) objects are preferred 
toit. The ordinals of the ranking are then biunivocally related to the cardinals 
expressing the number of objects which are preferred. It is thus legitimate 
275 


276 M: G. KENDALL AND B. BABINGTON SMITH 


to apply the laws of cardinal arithmetic to them. For example, if an object A is 
ranked 7; by Brown, 72 by Jones and r; by Robinson we may form the sum 
(r: + r2 + 73), which is to be interpreted as meaning that, taking the preferences 
of the three persons together, there were (7; + 72 + r3 — 3) cases in which 
some other object was preferred to A. The point is of some importance, in 
view of the prevailing practice of replacing ranks by quantiles of the normal 
distribution—a practice which cannot always be regarded as justifiable and is 
sometimes little short of desperate. 
To fix the ideas, consider the following three rankings of six objects 


Object: A B C D Kk F 
5 4 1 6 3 2 
2 3 1 5 6 4 
4 1 6 3 2 5 
Sum of ranks 11 8 8 14 11 11 


We may sum the ranks for each object, as shown. These sums (which must 
add to 63, and in general to mn(n + 1)/2) reflect the degree of resemblance 
among the rankings. If the resemblance were perfect the sums would be 3, 
6, 9, 12, 15, 18 (though not necessarily, of course in that order) and in such a 
case would be as different as possible. On the other hand, when there is little 
or no resemblance, as in the example given, the sums are approximately equal. 
It is thus natural to take the variance of these sums as providing some measure 
of the concordance in the rankings. If S is the observed sum of squares of the 
deviations of sums of ranks from the mean value m(n + 1)/2 (i.e. is n times 
the variance) we may write 

(2) ra 


m?(n3 — n) 


and call W the coefficient of concordance. Here m*(n* — n)/12 is the maximum 
possible value of S, occurring if there is complete unanimity in the rankings, 
so that W may vary from 0 to 1. In the example given, S = 25.5,W = 0.16. 
The coefficient W has arisen in several ways. 
m 
2 
coefficients between pairs of the m rankings. It is easy to show that the average 
p is given by 


(a) W is simply related to the average of the ( 


) Spearman rank correlation 


128 
(3) _n—n 


m*? — m 
mW —1 
(4) = 
m—l 


Pay has been considered by Kelley [3] as a measure of average intercorrelation in 
rankings, but he gives no results for testing the significance of observed values. 













um 
1gs, 
.16. 


tion 


rage 


yn in 
lues. 


PROBLEM OF ™ RANKINGS 277 





It is to be noted that whereas W may vary from 0 to 1, p., may vary from 
—1/(m — 1) to 1, i.e. it is asymmetrical like the coefficient of intraclass correla- 
tion, to which it bears some resemblance.’ 

(b) Friedman [1] has considered a quantity x’ related to W by the equation 


(5) x; = m(n — 1)W. 


(c) Welch [6] and Pitman [5] have considered the problem of the distribution 
of variance in an array 






GQ ,M2,°°: An 


bi, bo, +++ On 



















etc., for permutations of the numbers a, b, etc. in rows. 
This is more general than the ranking case, in which a; --- a, , b; --+ by ete. 
reduce to permutations of the numbers 1 --- n. Since S’, the total sum of 


squares in an array of m rankings of n, is m*(n*> — n)/12, we have 
(6) 


i.e. the ratio of variance between columns to the total variance. 





2. Significance of W. To test whether an observed value of W is significant 
it is necessary to consider the distribution of W (or, more conveniently, of S) 
in the universe observed by permuting the n ranks in all possible ways. No 
generality is lost by supposing one ranking fixed, and the others will then give 
rise to (n!)”* values of S. 

The actual distribution of W (or S), as will be seen below, is irregular for low 
values of m and n, and likely to be quite irregular for moderate values. It 
may, however, be shown that the first four moments of W are 


(7) 1 (about 0) = 
m 


_ 2(m — 1) 
(8) we = me 
( _ 8(m — 1)(m — 2) 
” == m®(n — 1)? 

_ 24(m — 1) {25n*® — 38n? — 35n +72 , 

i wOaiF ~ 25(n? — n) + 2(n — 1)(m — 2) 
(10) 
+ "43 m— am —a)}. 


1 The Spearman rank correlation coefficient is the product-moment coefficient of correla- 
tion between the ranks considered as ordinary variate values. ay is the intraclass correla- 
tion coefficient for the m sets of ranks, also considered as variate values. 


278 M. G. KENDALL AND B. BABINGTON SMITH 


Results equivalent to these for the first three moments were given by Fried- 
man [1]; and for the first four moments by Pitman [5]. 

In a valuable contribution to the subject Friedman showed that the distri- 
bution of x° tends to that of x” with (n — 1) degrees of freedom as m tends to 
infinity and suggested the use of x? (equation (5)) for an ordinary test of sig- 
nificance in the x’ distribution. This is satisfactory for moderately large values, 
but for small values it is subject to the disadvantage inherent in any attempt 
to represent a distribution of finite range by one of infinite range—the fit near 
the tails is not likely to be very good. An improvement is obtained by noting 
that the first four moments of the Type I distribution, 


_ 4 
B(p, 9) 


are approximately those of W if m and n are moderately large, and 


(11) df wea — wy) 


n—1 _ 1 
2 m 


(13) <t~H ‘" 5 ak 1}. 


(12) p= 


For practical purposes it is most convenient to put 


(m — 1)W 


— 
(14) z = 3 log. i. Ww 


so that z can be tested in Fisher’s distribution with (n — 1) — — (= m) and 
m 

f 2 
(m — 1) \ —-1)- - (= ne) degrees of freedom. 
1 

There can be little doubt that this test is quite reliable for moderate values 
of m and n; but it has hitherto been far from clear how reliable it is for low 
values of m and n. This point we attempt to clear up in the present paper. 


3. Distribution of S. For the case m = 2 the distribution of S is the same 
as the distribution of the S(d*) used in calculating Spearman’s rank correlation 
coefficient. A table showing the distribution up to and including n = 8 has 
already been given (Kendall’and others, [4]). Tables giving probabilities that 
specified values of x; would be attained or exceeded were given by Friedman [1] 
for n = 3, m = 2-9;and n = 4, m = 2-4. We have taken this work somewhat 
further and obtained the distributions for n = 3, m = 2-10; n = 4, m = 2-6; 
and n = 5, m = 3. Tables 1-4 give the probabilities based on these distri- 
butions. 

These distributions were obtained by two methods. The first consisted of 
building up the distribution for (m + 1) and n from that of m and n. For 





PROBLEM OF ™ RANKINGS 


TABLE 1 


Probability that a given value of S will be attained or exceeded, for n = 3 and values 
of m from 2 to 10 


Values of m 





eo eS t et wT 





000 (1.000 1.000 1.000 | 
.954 | .956 | .964 |. | .971 
.691 | .740 | .768 |. .814 
.522 | .570 | .620 |. .685 
.367 | .430 | .486 | . .569 
.182 | .252 | .305 | .355 | .398 
124 | .184 | | 

.093 | .142 | .192 
.039 | .072 

024 | .052 

| .029 

| 012 | . 

| 0081 | . 
0055 | . 

| 0017 | . 

| 013 | . 








° 























M. G. KENDALL AND B. BABINGTON SMITH 


TABLE 2 


Probability that a given value of S will be attained or exceeded for n = 4 and 
m= 38and5 


m=5 ; m=5 


1.000 .055 
975 | 044 
.944 .034 

.857 | | 031 

771 3 | 023 

.709 g | 020 

.652 77 | 017 

.561 8 | 012 

.521 .0087 

445 |  .0067 
| 0055 

.0031 
.0023 
.0018 
. 226 9¢ .0016 
.210 | 0014 

.162 | 0°64 

141 .0°33 

.123 .0°21 

51 | .107 .0°14 
53 | .093 .0448 
57 .075 .0°30 

59 | .067 


example, with m = 2 and n = 3 we have the following values of the sums of 
ranks, measured about their mean: 


Type Frequency 

—2 0 1 
—2 2 
—1 2 

0 1 
Here —2, 1, 1, and 2, —1, —1 are taken to be identical types, for they give the 
same value of S and will also give similar types when we proceed to m = 3 as 
follows. 

In the case m = 3 each of the above type will appear added to the six permuta- 
tions of —1, 0, 1; e.g. the type —2, 0, 2 will give one each of —3, 0, 3; —3, 1, 2; 










as of 


e the 
3 as 


nuta- 
1, 2; 









































































































1.000 
.958 
.833 
792 
.625 
. 042 
458 
.375 
. 208 
. 167 
.042 








m=2 








1.000 
. 992 
. 928 
.900 
.800 
754 
.677 
.649 
524 
.508 
.432 
.389 
.395 
.324 
. 242 
. 200 
. 190 
. 158 
141 
- 105 
.094 
.077 
.068 
054 
.052 
.036 
033 
.019 
.014 
.012 
. 0069 
. 0062 
.0027 
.0027 
.0016 
. 0°94 
. 0°94 
. 0994 

0472 


TABLE 3 
Probability that a given value of S will be attained or exceeded for n = 


PROBLEM OF m™ RANKINGS 


m= 2,4and6 


m= 4 





1.000 
.996 
.957 
. 940 
874 
.844 
789 
772 
.679 
. 668 
.609 
.574 
.541 
.512 
.431 
.386 
.375 
.338 
317 
.270 
. 256 
. 230 
.218 
197 
. 194 
. 163 
. 155 
127 
.114 
. 108 
.089 
.088 
.073 
. 066 
.060 
056 
.043 
.041 


m=6 


84 

86 

88 

90 

94 

96 

98 
100 
102 
104 
106 
108 
110 
114 
116 
118 
120 
122 
126 
128 
130 
132 
134 
136 
138 
140 
144 
146 
148 
150 
152 
154 
158 
160 
162 
164 
170 


4 and 





.022 
.017 
.014 
.013 
.010 
.0096 
.0085 
.0073 
.0061 
.0057 
.0040 
.0033 
.0028 
.0023 
.0020 
.0015 
. 0°90 
. 0°87 
. 0°73 
. 0°65 
. 0°40 
. 0°36 
. 0°28 
. 0°24 
. 0°22 
0712 
0495 
. 062 
. 0446 
. 0424 
0416 
0412 
0°80 
. 0524 


M. G. KENDALL AND B. BABINGTON SMITH 


TABLE 4 


Probability that a given value of S will be attained or exceeded, forn = 5andm = 3 


m=3 S 


1.000 44 
1.000 46 

. 988 

.972 

941 

.914 

845 

.831 

. 768 

.720 

.682 

.649 

995 

.009 .026 

493 .017 
30 475 .015 
32 .432 .0078 
34 .406 .0053 
36 347 .0040 
38 .326 .0028 
40 .291 . 090 
42 253 90 . 069 


—2, —1,3; —2, 1,1; —1, —1, 2; and —1,0,1. These types are then counted 
for each of the four basic types of m = 2 and we get: 


Type Frequency 
—3 0 ‘ 1 
—3 1 6 
—2 0 6 
Pa 1 
0 


—] 
0 0 


The case m = 4 is treated by considering the numbers of types obtained by 
adding the six permutations of —1, 0, 1 to the types for m = 3; and so on. 

This method is quite convenient forn = 2andn = 3. For n = 4 it becomes 
difficult owing to the labour of considering 24 permutations at each stage and to 
the increase in the number of types. For n = 5 there are 120 permutations and 
the labour becomes excessive. 









ed by 


comes 
and to 
ns and 





PROBLEM OF ™ RANKINGS 283 







The second method employed is a generalisation of a procedure we used for the 
Spearman coefficient. Taking first of all the case m 2, consider the array 


ai"*? 


1 
gitt) q'"*®) q\"*» Ps 













Any permissible set of values of the sums of ranks is obtained by selecting n 
entries from this array so that no entry appears more than once in the same row 
orcolumn. If then, subtracting from each index the mean (n + 1) and squaring, 
we write 





gn" q'*?” aud 


gq” q*-*? ciate 






(15) 









the values of S are the powers of a in E when it is expanded as a sum of n! terms 
each of which is obtained by multiplying n factors which do not appear in the 
same row or column. The distribution of S is arrayed by the expansion of E, 
the number of values of any S being the coefficient of a* in the expansion. 

Similarly, for m rankings, the distribution of S is given by the expansion of an 
m- dimensional E-function. For example, with m = 3 there would be a three- 
dimensional E-function the bottom plane of which would be 


i. i meee ; 
: } ; a a { n+3— 


f ata 2cntt) Y om ‘ 


2 
The plane above this would be 


non th) } " 


a a 


ete 


ey jeer 


=e } . 
a 2 










aa n+3 













{ i 2 Bsa } ’ 


oeoeoereeree eee eee eee eee eeeeee eens 


aap sas git a 






2 


and so on. 

The £-function is difficult to handle in more than three dimensions, but for 
the two and three dimensional case it is manageable and we used it to obtain 
the distribution of S for n = 5 and m = 38. 









4. Adequacy of the z-test. Tables 1-4 provide exact tests for the values of 
mand n there given. It remains to be seen how good the ordinary z-test applied 
to W would be for higher values. It may be presumed that if the test is satis- 


284 M. G. KENDALL AND B. BABINGTON SMITH 


factory for any particular value of m and n for which exact results are available, 
it will be so for higher values of m and n. Since, for ordinary purposes the 
significance points of z as tabled by Fisher and Yates [2] would be employed, 
the most useful comparison would seem to be between those tables and the 
extreme values of tables 1-4. 

For n = 3, m = 9, the 1% level is given approximately by S = 78. We have, 


testing for such a value, W = 0.4814, z = 1.002, mn = + nN = = By linear 
interpolation of reciprocals in the tables of z we find for the 1% point and these 
degrees of freedom z = 0.954. The correspondence is hardly satisfactory, and 
the z test might lead to incorrect inferences in practice. Matters improve a 
good deal, however, if we make continuity corrections, by subtracting unity 
from S before calculating W, and increasing by two the divisor m’(n* — n)/12, 
so as to allow for the finite range. In this case z = 0.979. 

For n = 4, m = 6 the 1% point is approximately S = 100. We have W = 
0.5556, z = 0.916, mn: = 8/3, ne = 40/3. By linear interpolation as before we 
find z = 0.888. 

Continuity corrections again materially improve the agreement, giving a 
value of z = 0.893. 

For n = 5m = 8 there is no very convenient value of S close to the 1% point. 
For P = 0.015 S = 74 and for P = 0.0078 S = 76. 


For S = 74 (with continuity corrections) z = 1.020 
S = 76( “ - " ) 2 = 1.089 


By interpolation from the tables z = 1.075. The use of the z test would lead 
to the correct conclusion that a value of S equal to 74 falls below, and that of 
76 above, the 1% point. 

For values of m and n not included in Tables 1-4 it thus appears that the z- 
test with continuity corrections will give sufficiently accurate results, if n is 
greater than 3, at the 1% points. It may be presumed that the results at the 5% 
points are equally good and probably better. But for finer values of signifi- 
cance, such as 0.1%, it is doubtful whether the test is sound. The tails of the 
distribution of S for moderate values of m and n are very irregular. 

For instance, the following is the tail of the distribution of S for n = 3,m = 10 
(the total distribution being 10,077,696): 


Ss Frequency 5S Frequency 

96 11,340 740 

98 30,090 252 
104 13,830 420 
114 7,380 240 
122 4,200 90 
126 3,240 90 
128 1,450 20 
134 1,860 1 












ity 


int. 


ead 
it of 


1e Z- 
n is 
5% 
nifi- 
the 


PROBLEM OF m™ RANKINGS 


and the following is the tail for n = 4m = 6 (the total being 7,962,624): 










S Frequency S Frequency S Frequency 
100 5536 122 4100 146 810 
102 8160 126 4480 148 225 
104 10260 128 240 150 264 
106 8850 130 1152 152 120 
108 3920 132 660 154 180 
110 13344 134 1980 158 60 
114 5460 136 300 160 36 
116 3870 138 600 162 30 
118 3900 140 312 164 45 
120 2472 144 100 170 18 








180 1 


Irregularities of this kind run all through the distributions we have obtained, 
and frequency diagrams present the same sort of features we have noticed in 
the case m = 2 (Kendall and others, [4]). The representation of such distribu- 
tions by continuous functions, no matter how close their lower moments, is 
obviously to be used with some care. Although the B-distribution or the asso- 
ciated z-distribution will give reasonable significance tests at levels of 1% or 
greater, they will probably be inadequate to represent frequencies occurring in 
narrow ranges. 

























5. Some Experimental Distributions. In some previous work we obtained a 
number of random permutations of the numbers 1-10 and 1-20. These were 
used to derive some experimental distributions of S which may be worth re- 
cording. Table 5 gives the distribution for 200 sets of pentads of 10 and 
Table 6 that for 100 triads of 20. In the distribution of Table 5, the mean of 
the grouped distribution is 404. The theoretical mean is 412.5 with a standard 
error of 12.3. In Table 6 the mean is 1936, the theoretical mean being 1995 
with s.e. 53. The distributions accord quite well with expectation. 

In conclusion we give two examples to illustrate some points of importance 
in ranking work. The first is a case in which ranks appear as the primary 
variate and in which the assumption of normality is clearly illegitimate. 









6. Example 1. In some experiments in random series a pack of ordinary 
playing cards was shuffled and the order of the 13 cards of each suit from the 
top of the pack was noted. The pack was then reshuffled and again the orders 
noted. This was done 28 times. The question we wished to discuss was 
whether the shuffling was good, in the sense that the cards were thoroughly 
mixed at each shuffle. 

Here, for each suit, say diamonds, we have 28 rankings of 13. The sums of 
ranks were 183, 137, 171, 207, 188, 160, 225, 174, 216, 192, 236, 239, 220. The 
mean is 196, and S = 11522, W (without continuity corrections, which are not 







M. G. KENDALL AND B. BABINGTON SMITH 


TABLE 5 


Experimental Distribution of S in 


TABLE 6 


Experimental Distribution of S in 


200 sets (m = 5,n = 10) 


100 sets (m = 3,n = 20) 


Frequency 


Frequency 


1 


200 


worth making for these values of m and n) = 0.08075, z (equation (14)) = 0.432. 
This falls just beyond the 1% point. 

Similarly for the clubs W was found to be 0.0535; for the hearts, 0.0245; and 
for spades, 0.0342. None of these values is significant and we conclude that the 
randomisation introduced by the shuffling was good, at all events, so far as this 
test was concerned. It may be added that the shuffling was done with much 
more care than would be taken in an ordinary game of cards. 

In psychological work there has sometimes been a confusion between the 
determination of a measure of agreement between subjects and that of an ob- 
jective order based on experimental rankings. It may therefore be as well to 
point out that in its psychological applications the test of W is one of concord- 
ance between judgments. There may be quite a high measure of agreement 
about something which is incorrect. 








PROBLEM OF m™ RANKINGS 287 

























7. Example 2. A number of students were given 12 photographs of persons 
unknown to them, and asked to rank them in what they judged from the photo- 
graphs to be their intelligence. For 16 students the sums of ranks were 


112, 94, 101, 84, 97, 75, 104, 84, 102, 146, 125, 124 


The mean is 104. S = 4472, W = 0.1222. z = 0.368, and is barely significant, 
being between the 1% and the 5% points. 
For 111 students the sums were 


818, 670, 908, 410, 706, 526, 780, 485, 596, 1044, 959, 756 
W = 0.2378, z = 1.768 


This is highly significant and it is to be inferred that community of judgment 
exists between students or groups of students. But there was little relationship 
between the judgments and the intelligence of the photographed subjects as 
given by the Binet Intelligence Quotient. 






Note added in proof: 
While this paper was passing through the press Professor W. Allen Wallis, of Stanford 
University, kindly drew our attention to some unpublished work of his own on this sub- 
ject. Professor Wallis had also arrived at the coefficient W which, he points out, is the 


ranking analogue of the correlation ratio. His paper is, we understand, on the point of 
publication. 


REFERENCES 
[1] M. Friedman, ‘“‘The Use of Ranks to Avoid the Assumption of Normality Implicit in the 
Analysis of Variance.’’ Jour. Am. Stat. Assn., Vol. 32(1937), p. 675. 
[2] R. A. Fisher, and F. Yates, Statistical Tables for Biological, Agricultural and Medical 
Research, 1938, Oliver & Boyd, Edinburgh. 

[3] T. L. Kelley, Statistical Methods, 1927. 

[4] M. G. Kendall, Sheila F. H. Kendall, and Bernard Babington Smith, ‘“The Distribution 
of Spearman’s Coefficient of Rank Correlation in a Universe in which all Rank- 
ings Occur an Equal Number of Times,’’ Biometrika, Vol. 30(1938), p. 251. 

[5] E. J. G. Pitman, ‘‘Significance Tests which may be applied to Samples from any Popula- 
tions: III. The analysis of variance.’ Riometrika, Vol. 29(1938), p. 322. 

[6] B. L. Welch, ‘‘On the z- test in Randomised Blocks and Latin Squares,’’ Biometrika, 

Vol. 29(1937), p. 21. 















Lonpon, ENGLAND 
AND 

UNIVERSITY oF St. ANDREWS, 

SCOTLAND. 


NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


On 2 


THE ALLOCATION OF SAMPLINGS AMONG SEVERAL STRATA 
By J. Stevens Stock AND Lester R. FRANKEL 


1. Introduction. The problem of selecting a random sample so as to obtain 
optimum precision in making estimates has been the subject of inquiries by 
Bowley,’ Neyman,’ Sukhatme’ and others. In estimating an average value of a 
variate in a population it is often profitable to stratify the universe into several 
homogeneous parts and sample at random within each of these parts. In order 
to obtain maximum efficiency for a given size of sample it appears that the 
number of samplings from each stratum should be proportional to the standard 
deviation of the characteristic under consideration and to the total number of 
units within the stratum. By distributing the sample in such a manner optimum 
precision will be obtained in estimating a general average. 

However, it often happens that it is not the purpose of an investigation to 
study the aggregate of the universe. Evaluations and interrelations of char- 
acteristics in different groups or strata within the universe may be of importance. 
Thus, in cost-of-living surveys in a number of urban centers the object is to 
compare costs among the cities of different backgrounds. In such cases it is 
desirable for each city to have equal reliability so that each one may be treated 
asaunit. There are many other situations in the social sciences where analyses 
of this type are of importance. 


2. The Problem. In general, the sampling problem is: Given several well 
defined areas of study and a fixed number of observations with which to make 
the survey, how best to distribute the observations such that each area will be 
represented with equal precision. 


There are n observations to be distributed among m areas or strata. In the 


1 A.L. Bowley, ‘‘Measurement of the precision attained in sampling,’’ Bulletin de l’ Insti- 
tute International de Statistique 1926 Rome, Tome XXII, 1-ere Livraison, 3-eme partie, 
pp. 1-62 (supplement). 

2 J. Neyman, ‘‘On the two different aspects of the representative method,’ Journal of 
the Royal Statistical Society, 1934, pp. 558-625. 

3P. V. Sukhatme, ‘‘Contribution to the theory of the representative method,’’ Supple- 
ment to the Journal of the Royal Statistical Society, Vol. II, 1935, pp. 253-268. 


288 










sti- 
tie, 


ul of 


ple- 


ALLOCATION OF SAMPLINGS AMONG SEVERAL STRATA 289 








i-th stratum, if N; is the total number of units, S; the variance of the character- 
istic to be measured, and n; the size of the sample, the sampling error of the 
arithmetic mean is 


(1) = f= (Ni ae ni) 

: n; (Ni — 1) ° 
The problem then is, given N;, numbers proportional to S; and n, to determine 
n; such that 





oO = dp = “ee = Om. 


3. First Solution. If we assume that the variances Sj are all equal and that 
for VN; — 1 we may substitute N; , we have 
















Ni-—m _ Ne— ne _ _ Na — Mn 


@) mN, noNe illite. tn Nm 


From the total amount of money available and the cost per sampling unit we can 
determine the total number of observations to be apportioned among the m 
populations 


(3) n= >. 


1 










We are able then to write m equations in m unknowns: 
From (2) we may write m — 1 equations 










] 1 1 1 
4 sei dak can: a ae ee a 
(4) m Ni nu WN; 
and from (3) we may write one equation. 
(5) mt nt: +m = Nn. 


But equations (4) are not easily soluble in their present form; they can be made 

linear by writing the approximation 
1 i l ee 1 = ay 
n, L (1 + a) ts 


Where L; is some reasonable approximation of n; chosen such that 














™m m™m 


and a; is some small correction for L; to be determined. We have then approxi- 
mately, 






l—aq 1 1 — a; 1 
6 a Pc 
( ) Ly Ni L; N; 
and from equation (5) we get 


(7) al; + aele + --- + amlm = 0. 


(j = 2, 3, --- m) 






290 J. STEVENS STOCK AND LESTER R. FRANKEL 


If we write 


eS 
MN, 


we may write (6) and (7) in the following form: 


@ = HL; ( 


)+h-L 


— Lea, + Lia $1 
— Lay + [yas 3 


— Dna + Lnoay = dm 
Lia, + Leas + Lgag + --- + Lnam = 0 
The matrix of the coefficients is 
—-Ile In 0--- O 
— Le 0 b::: O 
ni, & “+ «aarllie 
. Be » sandy 
From this matrix we find that 
— gil 
2 aes 


7 
3 


and from the general form of equation (8) we have 


(10) a = 


(11) a= gi + Lion 


, Ly 
These two equations (10) and (11) give us all the a;. It is then only necessary 
to compute the second approximations of n; by 


(12) L; = Lil + a) = n. 


Closer approximations, though perhaps unnecessary, can be made by repeating 
the computation with the next approximations. The final approximations 
may be checked by substituting them in equations (4). 


° . 2 
4. Second Solution. Sometimes the numbers S; are known or at least pro- 
portionate numbers can be estimated with a fair degree of accuracy for each area. 
- . 2 . — 
We shall call these proportionate number £;. We now have the conditions 


$ N; — 2 Ne — Ne N — mn 
(13) ew. sed m _ m 


= SE = ee =— 2 a 
$1 mM, ‘ue 2 neNz2 Sm Nm Nm 












sary 


ting 
ions 


pro- 
area. 


ALLOCATION OF SAMPLINGS AMONG SEVERAL STRATA 






and as before 
> nm =n. 
1 


We may write m equations in m unknowns, a;, using the approximations L; 
4 
as before: 














— SiLea, + S2lia2 = 6 

— Siler. + S3lia; = 65 
(14) et eae eee oe ke S 

— SiLmex + SiLiom = Om 

Tyo, + Leag + --- + Limam = 0 
Where 
2 2 Si _ Si 
(15) 6, = LS; — LS; + Lily (3 _ =). 
Solving these m linear equations for a; we get 
_ 2 6; Li/S; 


a= 


m 


St 2) Li/Si 

2 
and from the general form of equations (14) we have 
_ 4+ Si Liar 
Sil, — 


These a; may be applied as before to the approximations L; for new approxima- 
. / 
tions L; of the numbers n; . 


Qa; 


5. Remarks. (i) In either case the applications of the corrections to the 
approximation L; may be applied in two different ways: 


(16) Li = Lil + a) 











(17) Li = fy 


1 on ay 





When the corrections are applied according to (16) the sum of the new approxi- 
mations adds up correctly to the total n, and no further adjustment need be 
made in the L; either for repeating the operation again for nearer approxima- 
tions or for final use. If, however, the corrections are relatively large, say 










‘The numbers S? and £? may be used interchangeably since they are by hypothesis 
proportional. 








292 J. STEVENS STOCK AND LESTER R. FRANKEL 


greater than .10, there seems to be better convergence with the second approxi- 
mations if formula (17) is used and the resulting L; adjusted proportionately 
such that they add up to n. These numbers then can again be adjusted with 
new a; for final approximations. 

(ii) The numbers Sj or £7 are not always estimable. If they are not known 
at all or are known to be all nearly equal the first solution is perhaps the more 
useful. If these numbers are known, and known to be different, the second 
solution is necessary. However, some saving in computation by the second 
method may be effected if the approximations L; are first adjusted by the first 
solution before being entered into the computation of the second solution. 

(iii) Further accuracy, though perhaps unnecessary, may be attained in the 
second solution by substituting throughout S;? for S? where 


This substitution eliminates any slight inaccuracies caused by substituting 
N; for N; — 1. 


(iv) The initial approximations L; may in almost every case be gotten from 


the following formula: 
n n\ (1 Isl ) 
L; = — ‘5 -PT aed ‘ =. . 
m - (¥ m x N; 


(v) In all that has been presented above it has been assumed that the sample 
has been drawn without replacements from a finite universe. Whether or not 
this assumption is tenable depends upon the particular object of the research. 


6. Example. In the Survey of Youth in the Labor Market conducted by the 
Division of Research in the Works Progress Administration youth who com- 
pleted the eighth grade in the school years 1928-1929, 1930-1931 and 1932-1933 
were studied. In six cities, Duluth, Denver, Birmingham, Seattle, San Fran- 
cisco, and St. Louis random samples from school records were selected. Funds 
permitted the use of 40,000 schedules. 

From school records it was possible to determine the total number of eighth 
grade graduates in each city for the years in question. The problem arose then 
as to what would be the most efficient method of distributing these 40,000 sched- 
ules among the six cities in order to compare the problems of youth. 

Assuming equal variances within cities, quotas were computed for each of the 
cities. From Table 1, summarizing the computations, it can be seen that the 
quotas fall somewhere between proportionate and equal frequencies. This last 
result would be expected if samplings had been made from infinite universes. 


7. Note. In the social sciences interest centers in deriving relationships 
among the various strata where each stratum is considered as a single unit. In 
such cases equal precision is desired. However, if the object of research is 





aor Ww SY 


S 


1S 





( 
ON THE COEFFICIENTS OF THE EXPANSION OF 2X - 293 


TABLE 1 


” 8th grade Initial | First | First. | Second Persons 
City gradu- | approxi-| correction | approxi- | correction | Quotas sampled 
ates mation term mation term | 
Duluth, Minn........ _......| 5,500) 4,000 —.02968| 3,881 | —.00077| 3,878| 70.51 
Birmingham, Ala...... ...| 9,000 5,500 +.06641 6,399—|; + .00148| 5, 343) 59.37 
Denver, Colo....... 12,500, 6,000 —.02690 5,352 — .00164 6, ,409) 51.27 
Seattle, Wash........ 15,000 6,500, + .07525) 6,989 | +.00257| 7,007) 46.71 
San Francisco, Cal. .....| 21,000) 8,000 +.01425 8,114 | —.00341; 8,086) 38.50 
St. aa DMO osc haesinse-eeaieal aa — — .07349 9,265 | + .00129) 9 277 29.93 
“Total. bedrest 94,000 40,000 ‘40, 000 | | 40 ,000) 


simply to draw contrasts between any two strata we would seek to minimize the 
standard error of the difference, 


von VSG x) + 8°(G, - wi) 


subject to the condition, 


™m™ 
Z u=n. 
1 
This leads to the result 
/ / 
S; __ & 
n; Ny ° 


Thus, the number of samplings from each stratum is, for all practical purposes, 


proportional to the standard deviations, irrespective of the size of the various 
strata. 


WaSHINGTON, D. C. 


ON THE COEFFICIENTS OF THE EXPANSION OF xX” 
By J. A. JosepH 


Let us construct the following triangular arrangement of numbers: 


1 
1 1 
1 3 2 
1 6 1] 6 
1 10 35 50 24 


l fin-—1) foln—1) .- ° ; fost -1) fna(n — 1) 
| filn) fo(n) . . . fn—(n) fn(n) 





294 J. A. JOSEPH 


where the n-th row can be constructed from the preceding row by means of the 
expression 


(1) n-fi(n — 1) + fixa(n — 1) = fiss(n). 


For example, the element 35 in the middle of the 4th row is obtained from the 
two elements immediately above it, 4-6 + 11 = 35. (The top element is 
counted as the zeroth row.) 

The elements in the (n — 1)st row are the coefficients in the expansion of 


) . ° ° ° ° 
z‘” as a function of 2, using the notation of the calculus of finite differences, 


For example, 
x(x — 1) (x — 2)(x — 3) 
= 2 — 62° + lla’ — 6z. 
Of course, the signs of the coefficients alternate. 


The function f;(m) is the sum of the products of the first n integers taken i 
at a time, namely 


(2) filn) — - €1€2 -°° & 


the summation being a symmetric function of the integers 1, 2, 3, --- , n. 
Equation (1) can be written as a linear, first order difference equation, 


(3) Afinu(n — 1) = fis(m) — finn(n — 1) = n-fi(n — 1) 

fiea(n — 1) = A“[n-fi(n — 1)]. 
Since fo(n) = 1 for all values of n, we can find fi(n), and consequently fe(n), 
and so on. Thus 


(2) 


fifn — 1) ' am 


(2) (4) (3) 
a -1|. _ 3n™ + 8n 
fo(n 1) A |: 5 | = a 


i 3n + = 
fxm —1) =A |» ( -— 


n® +4 8n +4 12n 
48 , 
The following theorems are true for the “triangle”’: 


THEOREM 1. The sum of the elements in any n-th row is equal to (n + 1)!, 
namely, 


(5) & filn) «i> 


THEOREM 2. The sum of the even elements of any row is equal to the sum of the 
odd elements, or 









ON THE COEFFICIENTS OF THE EXPANSION OF 2” 





















(6) dX (-1)'filn) = 0. 
From these coefficients we can generate the Bernoulli numbers: 
1 
Bo = 5 





Bo — B, = 


co] 


wo 


2Bo —_ 3B, + By 


I 


>| 


4! 


(7) 6B - 11 +6R-B =% 








! 
24B) — 50B, + 35B,—10B; +B, = * 
! 
fa(n)Bo — fus(n)Bi + fao(n)Be — ++» + (—1)"foBa - e+) 
Or, as a determinant: 
| 1 
5 1 0 0 0 
2! 
3 1 1 0 0 
3! 
(8) |B, | = q 2 3 1 0 
4! 
5 6 11 6 --- 0 
(n+ 1)! | 
| n+2 fn(n) fn—-i(n) fn-2(n) tas filn) | 
giving 


Bo = 3, B, = -%; B, = By = Be = --- = Ba, = O, 


vie 


a | oa 1 
Bz = ¥o; Bs = —ds,---. 


We may now take another ‘“‘triangle”’: 


1 





1 10 


1 Fi(n — 1) Faln _ 1) eeeecees solidi F_a(n 1) F,-a(n — 1) 
l F\(n) Fa(n)-ocecccccccccccsccccccccscscves F,-1(n) F,,(n) 





296 J. A. JOSEPH 


where the n-th row is obtained from the preceding row by the expression 
(9) (n — 21)Fi(n — 1) + Figa(n — 1) = Fiyi(n). 
For example, from the third row: 1, 6, 7, 1, we obtain the fourth row: 1, 4-1 +6 
= 10,3-6 + 7 = 25, 2-7 + 1 = 15,1. The following theorem is true for 
the F,(n): 

THEOREM 3. The elements in the (n — 1)st row are the coefficients in the expan- 
sion of x” as a function of the factorials x". 

For example: 


4 


a =o” + 62 + 7r? 4+ 2. 


From the generating equation (9) we can obtain, as before, the form of the 
functions Fo(n), F y(n), --- 


AF yi(n — 1) = Figi(n) — Fi(n — 1) = (n — 1)F (nr — 1) 
Fia(n — 1) = A '[(n — DF i(n — 1)). 


Since Fo(n) = 1 for all n 


(10) 


Fi(n — 1) 


24 
(4) (3) 
F3(n — 1) s| = 2) = | 


(2) (4) A, (3) 
Fon — 1) a| -)5 | ee. 


24 


(6) 5) (4) 
_n + 4n‘*”’ + 2n 
48 , 


From these coefficients we can generate the numbers of Laplace (the numbers 
Lm below must be divided by m! to yield the numbers of Laplace): 


le “ 4 
I+ Il, = 3 

ket Qe t+ wl 

I, + 72+ 6ls+ y=} 
Ly + 1512 + 261, + 10L, + Ls 


Ly + F,1(n) Le + F,-2(n) Ls + he Bee 
giving 
ly = 3, Le = —t, L3 = i. 
A determinantal solution is also obvious. 


CALIFORNIA INSTITUTE OF TECHNOLOGY. 

























ATTAINING A GIVEN STANDARD DEVIATION RATIO 297 


ON THE PROBABILITY OF ATTAINING A GIVEN STANDARD 
DEVIATION RATIO IN AN INFINITE SERIES OF TRIALS 


By JosepH A. GREENWOOD AND T. N. E. GREVILLE 


Suppose an event with constant probability p of occurrence to be repeated an 
infinite number of times, and suppose the ratio of the deviation from the ex- 
pected number of successes to the standard deviation +/npq to be recomputed 
after each trial. We are interested in the probability that this ratio will at 
some time equal or exceed some positive number k. It is not difficult to show 
that the value of this probability is unity, but as the fact has not, to our knowl- 
edge, been previously pointed out in the literature, we give the following proof. 
Let x, denote the number of successes obtained in the first » trials, let 


_ In — np 
Vv npg ” 
and let P denote the probability that, for some n, t, 2 k. We shall prove that 


P= 1. Todo this, let the infinite series of trials be subdivided into consecu- 
tive, mutually exclusive subseries of finite length, and let m; denote the number 


t 


i—1 
of trials in the 7-th subseries. Let N; = , i m; fort = 2, while N,; = 0. Let k’ 
7=1 


be any number greater than k, and let m; be so chosen that 


k"p 
q 


(1) m; 2 


for every 7, 


(2) Vm (« ~ i? i+ 1) > N; /* 


It follows from (1) that 


(3) m: = mp + k’ Wmpq for every 7. 
It follows from (2) that 
(4) mip + k’ ~/mipq = (Ni + mip + k-V(N; + mi) pq for every 7. 


Let y; denote the number of successes in the 7-th subseries. It is evident from 
(4) that if 


(5) yi = mp + k’ mpg 
for any 7, then 
tn ;4m; 2 k. 


Hence P is at least equal to the probability that (5) holds for some 7. 
Let p; denote the probability that (5) holds for a particular 7. It follows 
from (3) that, for every 7, p; > 0. Moreover, there exists a positive integer M 








298 JOSEPH A. GREENWOOD AND T. N. E. GREVILLE 


and a number h > 0, such that if m; = M, p; = h.’ Since there is but a finite 
number of possible values of m; less than M, there is a number po > 0 such that 
pi 2 po for every 7. Hence the probability that (5) holds for no value of i 
is at most 

lim (1 — po)’ = 0. 


s-—>o 


Therefore, the probability that (5) holds for some 7 is unity. 


DvuKE UNIVERSITY 
AND 
UNIVERSITY OF MICHIGAN. 


1Uspensky, J. V., Introduction to Mathematical Probability, p. 129. 








